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ABSTRACT 


Parallel computing is the wave of the future. As the need for computational power 
increases, one processor is no longer sufficient to achieve the speed necessary to solve 
today’s complex problems. 

The Air Force Space Command (AFSPACECOM) tracks approximately 8000 
satellites daily; the model used by the AFSPACECOM, SGP4 (Simplified General 
Perturbation Model Four), has been the operational model since 1976. This thesis 
contains a detailed discussion of the mathematical theory of the SGP4 model. 

The tracking of a satellite requires extensive calculations. The satellite can be 
tracked more efficiently with parallel processing techniques. The principles developed 
are applicable to a Naval ship tracking mulitple incoming threats; the increase in the 
speed of processing incoming data would result in personnel being informed faster and 
thus allow more time for better decisions during combat. 

Three parallel algorithms applied to SGP4 for implementation on a Parallel 
Virtual Machine (PVM) are developed. PVM is a small software package that allows a 
network of computer workstations to appear as a single large distributed-memory parallel 
computer. This thesis contains a description of several algorithms for the implementation 
on PVM to track satellites, the optimal number of workstations, and methods of 


distributing data. 
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I. INTRODUCTION 


The goal of this thesis ts to illustrate how a network of IPX Sunstations can be used as a 
parallel computer to solve a complex military requirement of tracking 8000 earth satellites daily. 
Parallel processing has already been used in Global Climate Modeling, Superconductivity, 
Seismic Imaging, and many other important applications in science today. Additionally, there are 
other important military applications where the use of parallel computing would be extremely 
advantageous. For example, today's Weapon Control Systems like AEGIS has enormous 
computational requirements to detect and destroy incoming threats. The use of separate 
computers located at individual enclaves versus a centrally located computer will reduce the 
vulnerability of a ship should 1t take a direct hit in the computer station. The necessary 
computing power will be continued by choosing unaffected stations; additionally, the increase in 
speed of processing incoming data would result in faster informed personnel and thus allow 
more time for better decisions during combat. 

Paralle! computing is the wave of the future. As the need for computational power 
increases daily, due to an increase in technological developments, one processor is no longer 
sufficient to achieve the speed in computations necessary to solve today’s problems. 

Two ways one can achieve greater computational efficiency with parallel processing are 

1. Purchase a computer developed solely for parallel processing applications 
or 
2. Use existing workstations found in most companies today. 
The first option requires the purchase of a computer like the INTEL 1PSC/2 Hypercube 


multicomputer. 


The INTEL iPSC/2 Hypercube at Naval Postgraduate school was purchased in 1987 for about 
$100,000.00; the Hypercube requires an additional $6000.00 per year to maintain, it is used 
solely for research projects. 

The second option, the use of existing workstations, requires only that one be willing to 
utilize the power of idle workstation's CPU to achieve computational efficiency by dividing a 
complex problem into smaller more manageable data components. 

The average computer user in the workplace today does not require 100 % of the CPU's 
power each hour of the day; additionally, at night the workstations remain idle until one logs in 
the next morning or after the weekend. 

The utilization of thousands of existing processors to solve problems with enormous 
computational requirements will be common apetice in the future. The price/performance 
advantage of this practice has not yet been fully realized; however, tomorrow's scientist will 
wonder how we achieved the advances in science and technology today with the use of serial 
processing alone. 

Once one realizes that there is a storehouse of computer power ready to be distributed 
freely, the next step is to learn how to utilize this power. This thesis will illustrate how a network 
of workstations can be used to increase the speed at which satellites are tracked. This work will 
become increasingly more important as the number of objects tracked daily steadily increases 
and the number of calculations required skyrockets. 

This 1s a continuation of the Parallel Processing Orbital Prediction work conducted at 
Naval Postgraduate School in the Mathematics Department orchestrated by Professors D.A. 


Danielson and B. Neta. In June 1992, Warren E. Phipps, Jr. developed several parallel 


algorithms for the Naval Space Surveillance Center's analytic satellite motion model. The model 
is inplemented in the FORTRAN subroutine PPT2. The algorithms were implemented on the 
INTEL iPSC/2 Hypercube (Phipps, 1992). In March 1993, Sara Ostrom studied the parallel 
computing potential of the Air Force Space Command analytic satellite motion model 
implemented on the INTEL 1PSC/2 Hypercube (Ostrom, 1993). Currently, Leon Stone is 
implementing parallel algorithms for the Navy's Satellite model using Parallel Virtual Machines. 
This body of work 1s the result of the implementation of the Air Force Space Command's 
analytic satellite model, SGP4, using Parallel Virtual Machines. 

Chapter II discusses the advantage of the Parallel Virtual Machine (PVM) in terms of 
cost, availability and fault tolerance factors. The history and components of PVM are discussed 
followed by a brief overview of a new extension to PVM called HeNCE. The chapter concludes 
with a short discussion of other parallel software packages available like Express, P4, and Linda. 
Chapter II] describes the Air Force Space Command's analytical models SGP and SGP4 and 
describes, in detail, the theory behind the prediction of a satellite's position and velocity. 
Chapter IV describes three algorithms developed to study the parallelization of the satellite 
computer code; additionally, a comparison of the each algorithm's performance is analyzed in 


detail. The last chapter, Chapter V, contains conclusions and suggestions for further research. 


I. PARALLEL VIRTUAL MACHINE 


In this chapter, the advantages of using a Parallel Virtual Machine (PVM) in 
terms of cost, availability, and fault tolerance factors will be discussed. The history and 
components of PVM will be covered followed by a brief overview of a new extension to 
PVM called the Heterogeneous Network Computing Environment (HeNCE). Finally, 
other software packages like Express, P4, and Linda will be briefly described. This is a 
synthesis of papers written about the Parallel Virtual Machine (see Dongarra, Geist, 
Mancheck, and Sunderman ,1993). 

Parallel! Virtual Machine is a small software package (~ Mbyte of C source code) 
that allows a heterogeneous network of Unix-based computers to appear as a single large 
distributed-memory parallel computer. The PVM package is good for large-grain 
parallelism; that is, at least 100K bytes/node. The term virtual machine 1s used to 
designate a logical distributed-memory computer and host is used to designate one of the 
member computers. 

The PVM software supplies the functions to automatically start up tasks on the 
virtual machine and allows the tasks to communicate and synchronize with each other. 
Note, a task is a unit of computation in PVM and 1s analogous to a UNIX process. 

A problem can be solved in parallel by sending and recetving messages to 
accomplish multiple tasks. These message-passing constructs are common to most 


distributed-memory computers. By sending and receiving messages, multiple tasks of an 


application can cooperate to solve a problem in parallel. The applications can be written 
in Fortran 77 or C. 

PVM handles all message conversion that may be required 1f two computers use 
different data representations. PVM also includes many control and debugging features in 
its user-friendly interface. For instance, PVM ensures that error messages generated on a 
remote computer are displayed on the user's local screen. 

PVM allows these application tasks to choose the architecture best suited to the 
solution. PVM also supports heterogeneity at the machine and network levels. 

At the machine level, computers with different data formats are supported as well as 
different serial, vector, and parallel architectures. At the network level, different network 
types can make up a Parallel Virtual Machine, for example, Ethernet, Fiber Distributed 
Data Interface (FDDI), token ning, etc. 

Users of PVM can also configure their own parallel virtual machine, which can 
overlap with other users’ virtual machines. Configuring a personal parallel virtual 
machine involves simply listing the names of the machines in a file that is read when 
PVM 1s started. 

A. ADVANTAGES OF PVM 

The first advantage of using PVM is a reduction in cost; it is and will continue to 
be costly to allocate large computing resources to each and every user. The beauty of 
using workstations for parallel processing is that a user of a workstation may not use the 


machine all the time, but may need more than what a single workstation can provide 


when applications are to be run. Many scientists are discovering that their computational 
requirements are best served not by a single, monolithic machine but by a variety of 
distributed computing resources, linked by high-speed networks. 

The second advantage in network-based concurrent computing is the ready 
availability of development and debugging tools. Typically, systems that operate on 
loosely coupled networks permit the direct use of editors, compilers, and debuggers that 
are available on individual machines; also, users are already familiar with the use and 
individual idiosyncrasies of each tool so that learning new skills is not necessary. 

The third advantage is the potential fault tolerance of the network(s) and the 
processing elements. Most multiprocessors do not support such a facility; hardware or 
software failures in one of the processing elements often lead to a complete crash. 
Additionally, it is the opinion of the author, that for Naval applications using different 
workstations in different areas of a Naval ship can reduce vulnerability should the ship 
take a direct hit in a critical area. The computing power needed for a combat system like 
Aegis could be continued by choosing unaffected stations. 

A study conducted by Eichelberger and Provencher (1993) explored using PVM 
to model a survivable AEGIS combat system for a CG47 Ticonderoga class AEGIS 
cruiser model. Present naval combat systems possess only manual reconfiguration and 
static rudimentary automatic reconfiguration schemes. The study concluded that there is 
a significant improvement in mission readiness when using a reconfigurable computer 


architecture. 


B. HISTORY OF PVM 

In the summer of 1989, at Oak Ridge National Laboratory (ORNL), the 
development of PVM software began and is now distributed freely in the interest of the 
advancement of science around the world. The driving force behind the initial 
popularity of PVM was the ability to get an excellent price performance ratio- better than 
any other computer system in the world. In general, a cluster of about 10 high 
performance workstations 1s potentially capable of solving a problem as fast as a 
supercomputer costing 20 tumes more; thus, PVM 1s rapidly becoming a de facto standard 
for distributed computing. How did all this begin? The following is a brief history of 


PVM's creation and it's creators: 


Summer 1989: Vaidy Sunderam designed and implemented the first version of 
Parallel Virtual Machine while visiting Oak Ridge National 
Laboratory. 

Summer 1990: Vaidy Sunderam and Al Geist refined the PVM software to 
develop a Fortran interface and severa! parallel applications: 
additionally, a graphical interface called XPVM was developed. 

November 1990: Al Geist developed a PVM version of large material science 
application code run on a network of IBM RS/6000's which won 
the 1990 Gordon Bell Prize for best price/performance ratio of any 


application in the world. 


December 1990: 


March 1991: 


Summer 1991: 


December 1991: 


February 1992: 


Summer 1992: 


February 1993 : 


April 1993: 


August 1993: 


Sunderam and Geist entered their PVM research into the 1990 
IBM Supercomputer competition and won first prize. 

PVM 2.0 was developed by Bob Mancheck from PVM 1.0 - the 
earlier research version. PVM 2.0 was made publicly available 
through netlib@oml.gov. 

Sunderam, Geist, and Manchek began working on the design 
features of PVM 3.0 such as dynamic configuration and new 
routine names. Additionally, a digest for users to exchange 
information was set up at pymlist@mathcs.emory.edu. 

Beguelin began the development of a new software package called 
Xab, a monitor and debugger for PVM programs. This version can 
be obtained by contacting adam@cs.cmu.edu. 

PVM 2.4 was released and HeNCE was made available through 
netlib@ornl. gov. 

Geist and his student developed a package built on top of PVM 2.4 
that dynamically load balances a users application. 

PVM 3.0 released. 

PVM 3.1 released. 

PVM 3.2 is released.To receive this software send email to 
netlib@ornl.gov with the message: send index from pym3 


or ftp from netlib2@cs.utk.edu directory pym3. 


C. COMPONENTS OF PVM 

The PVM system 1s actually composed of two parts , the daemon and a library of 
PVM interface routines. 

The daemon is called pymd3 (sometimes abbreviated pvmd) and resides on all the 
computers making up the virtual machine. Any user with a valid login can install this 
daemon on a machine. When the user desires to run a PVM application, he/she executes 
pvmd3 on one of the computers which in turn starts up pvmd3 on each of the computers 
making up the user-defined virtual machine. A PVM application can then be started 
from a Unix prompt on any of these computers. 

The library of PVM interface routines contains routines for passing messages, 
spawning processes, coordinating tasks, and modifying the virtual machine. The user can 
call any of these routines and application programs must be linked with this library to use 
PVM. 

D. APPLICATIONS 
A variety of applications have been developed over the pasi few years using 


PVM. Below 1s a partial list of some of these applications: 


Material Science 

Global Climate Modeling 

Atmospheric, oceanic, and space studies 
Meteorological forecasting 

3-D ground water modeling 
Superconductivity, molecular dynamics 
Monte Carlo CFD application 


* ££ ££ & &@ & & 


2-D and 3-D seismic imaging 

3-D underground flow fields 
Particle simulation 

Distributed AVS flow visualization 


oe *# & © 


As a result of this thesis , one can add Orbital Prediction to this list. 


Application programs are composed of subtasks (or components) at a moderate 
level of granularity. The programs view the PVM system as a general and flexible 


parallel computing resource which may be accessed at three different modes: 


1. Transparent -  subtasks are automatically located at the most 
appropriate sites. 

2. Architecture-dependent - subtasks specific for architecture execution are 
chosen by the user. 

3. Machine-specific - subtasks are located on a particular machine to 


exploit particular strengths of indrvidual machines. 


During execution, multiple instances of each component or subtask may be 


initiated. Figure 2.1 on the next page illustrates a simplified architectural overview 


of the PVM system (see Geist and Sunderman , page 3, 1993). 
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Application { 


Application 2 





Figure 2.1 Simplified Architectural Overview of PVM 


Application programs under PVM may possess arbitrary control and dependency 
structures; that is, at any point in the execution of a concurrent application, the processes 
in existence may have arbitrary relationships between each other and any process may 
communicate and/or synchronize with any other. Any specific control and dependency 
structure may be implemented under the PVM system by appropnate use of PVM 
constructs and host language control flow statements. 

Multiprocessing on loosely coupled networks provides facilities that are normally 
not available on tightly coupled multiprocessors. For example, debugging support, fault 
tolerance, and profiling and monitoring to find hot-spots or load rmbalances within an 


application. 


1} 


The disadvantages associated with networked concurrent computing are 
generating and maintaining multiple object modules for different architectures, 
considerations of security into personal workstations, and other administrative functions. 
PVM supports two auxiliary components that provide some features to overcome these 
disadvantages. First, the HeNCE interface is a graphical based parallel programming 
paradigm. Second, PVM 1s undergoing extensions to make PVM work on MPP 
machines which 1t now does on several made by Intel, TMC, Cray, and Convex with 
KSR and Sequent underway ( Geist, 1993). 

E. HETEROGENEOUS NETWORK COMPUTING ENVIRONMENT (HeNCE) 

HeNCE simplifies the wnting of parallel programs and was developed with two 
goals in mind : 

1. Make network computing accessible without the need for extensive training in 
parallel computing 
and 
2. Make the resources best suited for a particular phase of the computation available 
to the users. 

In HeNCE the programmer explicitly specifies parallelism of a computation by 
drawing graphs. The nodes in a graph represent user defined subroutines (written in 
either FORTRAN or C) and the edges indicate parallelism and control flow. HeNCE will 
automatically execute the subroutines in parallel (whenever possible) across a network of 


heterogeneous machines. HeNCE relies on the PVM system for process initialization 
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and communication. If one wishes to wnite explicit message passing parallel programs 


on a network of machines they should use the PVM system directly. 


Once the graph is complete, He NCE will automatically write the parallel program 


including all the communication and synchronization routines using PVM calls. HENCE 


tools exist to assist the user in compiling this program for a heterogeneous environment. 


HeNCE is composed of five integrated graphical tools. Below is a bnef 


explanation of each tool: 


1. Compose - 


2. Configure - 
3. Build - 
4. Execute - 
5. Trace - 


use to specify the parallelism of an application by drawing a 
graph illustrating dependencies between procedures 

use to specify a network of heterogeneous computers to be 
used as the PVM and defines a cost matrix between machines 
and procedures 

use to compile and install the procedures written by the 
compose tool 

use to dynamically map procedures to machines for execution 
of the application and collect tracing information 

use to read the trace information and display an animation of 
the execution, either in real time for debugging or later for 


performance analysis. 


An initial version of HeNCE is available through the net/ib. To obtain He NCE 
send email to netlib@orml.gov and next to subject one should type: send index from 
hence; any problems with HeNCE can be addressed to: hence@msr.epm.ornLgov. 

F. OTHER SOFTWARE PACKAGES 

Various other software packages have been developed that enable scientists to 
write heterogeneous programs; these, as well as PVM, have evolved over the last several 
years, but none of them can be considered fully mature. It is an exciting time in 
parallel computing and there are many grand challenges for scientists to explore. 

I would like to briefly discuss some of the other software packages, in order that 
the reader will be familiar with their names and features (see Dongarra, 1993). 

Examples of such other software packages include Express, P4, and Linda; however, it 
is important to note that these packages are by no means the only ones in existence. Each 
package is layered over the native operating systems, exploits distributed concurrent 
processing, and is flexible and general-purpose; all exhibit comparable performance. 
Their differences lie in their programming model, their implementation schemes, and 
their efficiency. 

Express toolkit is a collection of tools that individually address various aspects of 
concurrent computation. The toolkit is developed and marketed commercially by 
ParaSoft Corporation, a company started by some members of the Caltech concurrent 
computation project. Express is based on beginning with a sequential version of an 


application and following a recommended development life cycle culminating in a 
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paralle] version that is tuned for optimality. The core of the Express system is a set of 
libraries for communication, IO, and parallel graphics. 

P4 is a library of macros and subroutines developed at Argonne National 
Laboratory for programming a variety of parallel machines. P4 supports both the 
shared-memory model and the distributed-memory model. In the process management 
mechanism in P4 there is a "master" process and "slave" processes, and multilevel 
hierarchies may be formed to implement what is termed a cluster model of computation. 
Shared Memory support via monitors 1s a distinguishing feature of P4; however, this 
feature is not distributed shared memory, but is a portable mechanism for shared address 
space programming in true shared memory multiprocessors. A set of macro extensions 
was developed at GMD (Gesellschaft fir Mathematik und Datenverarbeitung in Schloss 
Birlinghoven, Gemany) called Parmacs. Parmacs provided Fortran interfaces and a 
variety of high-level abstractions dealing with global operations to the P4 system. 

Linda is a concurrent programming model that has evolved from a Yale 
University research project. The primary concept in Linda is that of a “tuple-space", an 
abstraction via which cooperating processes communicate. The tuple-space concept 1s 
essentially an abstraction of distributed shared memory, with one important difference 
(tuple-spaces are associative), and several minor distinctions (destructive and 
non-destructive reads, and different coherency semantics are possible). Applications use 
the Linda model by embedding constructs that manipulate the tuple space. Recently, a 


new system technique has been proposed, at least nominally related to the Linda project. 
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This scheme, termed "Pirhana" proposes a proactive approach to concurrent computing 
where resources seize tasks from a well known location based on availability and 


suitability. 
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Ill. SGP AND SGP4 


A. SIMPLIFIED GENERAL PERTURBATION MODEL(SGP) 

The original model used by the Air Force Space Command to track satellites was 
the Simplified General Perturbation model (SGP). The model was simplified by the 
exclusion of perturbation effects caused by higher order terms in the Legendre expansion 
of the Earth's gravitational potential or other celestial bodies like the moon or the sun. 
The model also assumed the drag effect on mean motion as linear in time; this 
assumption dictated a quadratic variation of mean anomaly with time. The drag effect on 
eccentricity was modeled such that the perigee height remained constant (Hoots and 
Roehrich (1980), page 2). 

These simplifications allowed an analytic solution to the equations of motion. 
Although the solutions are not as accurate as numerical techniques, they are 
computationally less expensive. Semi-analytic models increase the accuracy while 
decreasing the computational cost. See Dyar (1993) for comparison of various models in 
terms of accuracy and computer time required on a Sun Sparc 10. 

Hilton and Kuhlman (1966) developed the analytical SGP model. SGP's 
gravitational submodel is a simplification of the work done by Kozai (1959) and Brouwer 
(1959). For a more detailed discussion of the SGP model see Hoots and Roehrich (1980) 


and Sara Ostrom (1993), pp. 10-20. 
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B. SIMPLIFIED GENERAL PERTURBATION MODEL FOUR (SGP4) 

1. Overview 

The second model, SGP4, was obtained by a simplification of a more extensive 
analytical theory developed by Lane and Cranford (1969) which uses the solution of 
Brouwer (1959) for its gravitational model and a power density function for its 
atmospheric model [Hoots and Roehrich (1980), p.2]. SGP4 had replaced SGP as the 
operational theory at the AFSPACECOM by 1976. 

The SDP4 extension to SGP4 was developed to be valid for deep-space satellites. 
The deep-space equations were developed by Hujsak (1979). SDP4 models the effects of 
the moon and sun in addition to certain sectoral and tesseral Earth harmonics that 
become important for half-day and one-day period orbits. 

The SGP4 and it's extension, SDP4, are both analytical models. They identify 
variations in terms of changes in the osculating elements with respect to time. The 


models are more accurate than the original SGP model due to two factors: 


1. The inclusion of zonal harmonics through J,; whereas, the SGP model 
only included zonal harmonics through J,. 
2. The inclusion of a drag force in the equations of motion versus the linear 


simplification of the SGP model. 
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The main program, DRIVER reads the input and calls either SGP4 or SDP4. If 
the satellite is "near-earth" (e.g., orbital period less than 225 minutes) then SGP4 is 
called; otherwise, the satellite is classified "deep-space" and DRIVER calls SDP4. 

SGP4 and SDP4 receive input from the DRIVER and perform calculations 
necessary to return to the DRIVER the position and velocity vector in units of earth radii 


and minutes. The DRIVER performs a unit conversion to kilometers and seconds for 


printout. 
SGP4 and SDP4 both call two functions, ACTAN and FMOD2P. ACTAN is 


passed the values of sine and cosine and returns the angle in radians in the range of 


0 to 2x. FMOD72P is passed an angle in radians and returns the modulo by 21 of that 


angle. 

Additionally, SDP4 calls the subroutine DEEP. The first time DEEP is called 
certain constants already calculated in SDP4 are passed through an entry called DPINT. 
All initialized quantities needed for deep-space prediction are calculated. At this time, it 
is also determined whether the orbit is sychronous or if the orbit experiences resonance 
effects. During initialization, the subroutine DEEP calls the function THETAG. The 
function THETAG obtains the location of Greenwich at epoch and converts epoch to 
minutes since 1950. 

The next time SDP4 calls DEEP occurs during the secular update portion and is 
via the entry DPSEC. The secular update portion of SDP4 is where additional secular 


and long-period resonance effects are added to the values of the "mean" orbital elements. 


ho 


The final access to DEEP occurs via DPPER where the appropriate deep-space 
lunar and solar periodics are added to the orbital elements. 

2. Input Parameters 

The SGP4 model uses the six orbital elements, a drag factor, and an epoch 
reference time to predict the satellite position and velocity vectors at a future time. 

The six orbital elements are "mean" values obtained by removing periodic 
variations in a particular way. The elements are given below along with the name 


assigned to each in the SGP4 Fortran computer code: 





ARIABLE NAME SYMBOL IN THEORY COMPUTER CODE 


a | | 
pee | ee ee 
Inclination of Orbital Plane XINCO 

o the Equator 

Right Ascension of the Q, XNODEO 

Ascending Node 

ee | ee 
== | = | 


Table 3-1 Classical Orbital Elements 











The following diagram will be useful throughout this discussion in visualizing 


the satellites orbit and the angles given in table 3-1] above: 
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Earth's North Pole 2 Satellite 





Orbit Plane 


Node Line 


Perigee 


E = eccentric anomaly 


v = true anomaly 


E - esinE = M = n(t - T) 


Figure 3.1 Classical Orbital Elements 
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3. PROGRAM SEQUENCE FLOW 
The ten main steps to solve for position and velocity vectors are as follows: 
1) Recover original mean motion and semimajor axis from the input elements. 


2) If necessary, update the parameter for the SGP4 density function. 


3) Calculate constants using appropriate values of the density function from 
step two above. 


4) Account for the secular effects of atmospheric drag and gravitation. 
5) Add the long periodic terms. 
6) Solve Kepler's equation. 
7) Calculate the preliminary quantities needed for short periodics. 
8) Update the osculating quantities using the short periodics. 
9) Calculate the unit orientation vectors. 
10) Calculate the postion and velocity vectors. 
The SDP4 model follows these same steps with the addition of several calls to the 


subroutine DEEP which was discussed earlier. 


C. EQUATIONS 
This section will describe the equations developed by Hoots and Roehrich (1980), 
pp. 14-37 . The ten main steps listed above will serve as the outline of the discussion. 


A strict parallel structure exists between the computer code and the equations. 
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1. Recover Original Mean Motion and Semimajor Axis 
The input variable for mean motion (n,) requires modification after which it is 


denoted by n,”. This modification to n, 1s accomplished as follows: 








uw — ho ; : WW 
i) No” = 9 ral relationship of 7G to n, 
where 
22 ae 

as. = 3k2(3 cos*i, — 1) 

2a3(1 - €3)* 

2 

b. k, = S25 J,= the second gravitational zonal harmonic of the earth 


a’, = the equational radius of the earth squared 


1348} 
Cc. a.=a;(1-45,-33- 42) 


_ 3k2(cos7i, = 1) 


d.6 
" 2ai(i - €2)* 


2/3 
ea) = (ae) where k, = {GM,G is Newton's universal gravitational 
constant and M 1s the mass of the Earth. 


ao 


2) To recover the semimajor axis use ai! = is 





where 6, 1s the same as above. 
Oo 


2. Update The Parameter for the SGP4 Density Function 
Two parameters, s and gq, , for the SGP4 density funcion may require 


adjustments. The scale height parameter constant used by SGP4 1s 
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s = 1.01222928 earth radii (er); s changes depending on the height of the satellite at 
perigee. For perigees between 98 kilometers and 156 kilometers s is replaced by s* , 
where s* =a!(1-—e,)—s+ag with units of earth radii and where perigee height is 
calculated by perigee = [a1 - e,)-—az]e Re (kilometers) and Rz is the spherical 
earth radius. 

For perigees below 98 kilometers, s is replaced by s* where 


s* = ropep toe XKMPER = 6378.135 Kilometers/Earth radii 


It should be noted that ifs is changed then aterm (g.-s)’ is also replaced 


by (qo-s*)". 
From this point on, the double-prime notation will be dropped for the mean 
motion and the semimajor axis, as well as the * ons. It will be understood 


that these corrections have already beer «:2c when the symbols 7, , a, 
and s are used. 


3. Calculate Constants 
a. The following constants are calculated for both SGP4 and SDP4: 
6 =CcOSi, 


A245) 
o=aoos 





6, a (! _ e2)” 
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NY = Qe 06 


C,=B’°ecC, B* = drag coefficient 


C2 = Go-5)E'no(1 — n?)-7*[a.(1 + $n? + 4e0n + eon? + 


3k2e ] 392 2 4 
Botte )(8-+24n + 3n*)] 


Ca = 2n0(Go — 5)*E*aoBo(1 — n7)-"0 
2 1) 13, _ 2k25 


+ 2(1 - 62)(2n? - eon) — e0?)008200]} 


b. The following constants are calculated by SGP4 only for perigees above 220 


° [3(1 - 30*\(1 +5? - 2eon- Jeon’) 


kilometers: 


(Go —S)*E*A3onoagsin i, 
k2eo 


C3 = where A30 = —J3az 


Cs = 20 - s)*E4a,B20 - 7)" [1 + Syn + eo) + eon] 


D2 = 4a&C} 


ee Fak?(17a. +s)C3 


y= “cok (22a. +315)C! 


ZS 


4. Secular Effects of Atmospheric Drag and Gravitation 
Af, ,@, , and Q, are updated as follows: 


a. First, Mpr ,@pr,and Qpr are calulated: 
1) Mor =M, + MAt 
2) pF = Mo + @ At 
3) Opr = Qo + QAt 
where Ar = ¢4— f,. = time since epoch and 


. ie 2 2712 2 4 
Aye aie hae Cra 3k3(13 — 780 + 1376") ee 
2a58, 16458, 


© _ | sie 562) 3k3(7 -— 11407 + 3950 5k4(3 — 360? + 4904) yd 
Te 22282 16a4ps 4a5B5 ° 





-  [3k,@  3k2(40— 190° 10? 
a 3k20 , 3k3(40 190°) | 5k40(3 - 70”) Fe 
2a5B5 2a3B8 2asps 


Recall that, k2 = su 2Q - J, = the second gravitational zonal harmonic of the Earth 


and Ge Saas J, = the fourth gravitational zonal harmonic of the Earth 
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Note : this 1s the point in SDP4 where the DEEP initialization for deep-space 


calculations is entered through DPINT discussed earlier. 


b. Then M, , @ , and 2 are calculated by 
1) Mp = Mor + 50 + 6M 
2) ® = Oper — 6@ — 6M 


7 21n,k2,0C, Atl? 


ore aee eg? 


If perigee is less than 220 kilometers 
50 = 6M = 0 
otherwise, 
5@ = B°C3(cos@,)Al 


so = Bigs Bet RIC + neosMon) ~ (4 nea 


Note: At this point SDP4 calls the secular portion of DEEP via DPSEC to add 
the deep-space secular effects and long-period resonance effects to the 


six orbital elements. 


c. Next, e , a , and the mean longitude, Z , are updated as follows: 
1)e=e, — B°C,At — B°Cs(sinM, - sinM,) 
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Za=—as|i = C,Af a D.Afr or D3Ar i D,At‘}? 
3)L=Mp+@0+Q24 nN. °[SC.A0 + (D2 + 2C7{)Ar 


+4GDs + 12C,D2 + 10C})Ar 
+4GDs + 12C,D3 + 6D? + 30C2D2 + 15C4)Ar} 


If the perigee height is less than 220 kilometers then a and L equations are 
truncated after the C, term and the equation for e is truncated afier the C, term. 
d. The last step in this section is to calculate B and 7: 


1) B= Jl-e? 


ke 


2) n= 3, 





a 


Note: At this point SDP4 calls the periodics section of DEEP via DPPER to add 


the deep-space lunar and solar periodics to the orbital elements. 


5. Add The Long Periodic Terms 
The addition of long-periodic zonal effects are accomplished by the following: 


a axyw = ecos@ 


: A30Sini 
b. Qym = €Sin@ + ay where, QYyNL = ———— 
4k2aB 
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ayn and ayy are the horizontal and vertical components, respectively, of the 


eccentricity vector with respect to the line of nodes vector. The following figure 


illustrates the geometry of the components: 


Satellite's Orbit 


Force Center 





Figure 3-2 Geometry of Eccentricity Vector and Node Vector 


The mean longitude is then calculated by: 


Ir =[+ Lr 
where a =k A3,0Sin Cax( 3 ae 58) 
8k,ap? 14+67° 


Recall that ZL was calculated in the previous section. 
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6. Solve Kepler's Equation 


Solve Kepler's equation by a method of successive approximations. 


Lett V=LI7-Q 


and U = (£ + @), the first term in the iteration of the sum of the eccentric anomoly and 


the resulting argument of perigee. Thus, 
U= U,+ AU 


for successive iterations, that is 


(E + @) 41 = (E + @); + A(E + @); 


Let EPW=E+o then 


U — ayycos(EPW), + axvsin(EPW), — (EPW) 


a —ayysin(EPW); — axvcos(EPW); + 1 


Continue iterations until |A(EPW),| < 1.0° thenset E+ @ = (FE + ©). 


7. Short Periodic Preliminary Calculations 
The following equations are the preliminary calculations, the results are added in 


section eight to obtain the osculating quantities: 


a. ecosE = axyvcos(EPW)+ armsin(EPW) 


esinE = ayysin(EPW) —- ayycos(EPW) 
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1/2 
er = (a2, + aty 
pi = al - ef) 


r= al —- ecos£) 





] 


Temp3 = —— 
] + {l-e; 


cosu = 2[{cos(EPW) - axy + am (esin£)« Temp3] 


sinu = £[sin(EPW) — arm — ax (esinE)e Temp3] 


sin | 
COS u 





u= arctan 


faa os 


ko 2 
Ar ==—(1 — 8°)cos2 
Fan JOOS 2u 
Au = -“2(76? — 1ysin2 
en oC? )sin Lu 
Api 
AQ = 328 sin ay 
2PL 
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Di 


n. Ai = — SiN i,COS 2u 


° kan : 
O. Ar = zap — 67)sin2u 


¢ = han 


p  Arf= la — 67)cos 2u - xe -36)| 


8. Update The Osculating Quantities 


Now, the short periodic preliminary results are added to obtain the osculating 


quantities: 


k yl -e; 
rk = : = 2 eo p] a 
2 pi 


b. ux = ut Au 
S Oe, = 04+ AQ 
d. ix =it+Ai 
e. re =r+Ar 


f. ie 


a2 


9. Calculate Unit Orientation Vectors 


The osculating angles found above are utilized to find the unit onentation vectors 


as follows: 
= Myx —sin OQxcos ix 
M =| My {=| cosQgcoSix 
Mz sinix 
Z Nx cos Ox 
Nz 0 


then U = Msinux + Noosur 


and V= Mcosux — Nsinug 


10. Calculate The Postion And Velocity Vectors 


Finally, the position and velocity vectors are calculated as follows: 


r=rrU 


This results in the position and velocity in units of earth radii and minutes. The 
postion and velocity vectors are then passed to the DRIVER at which time the unit 


conversion to kilometers and seconds is accomplished. 
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IV. PARALLELIZATION OF SGP4 USING PVM 


OVERVIEW 

The goals of this chapter are two-fold: 

1. Explam how the Air Force Space Command's satellite code was parallelized 
using the Parallel Virtual Machine and 

2. Compare various algorithms in terms of total time, communication overhead, 


speedup, and efficiency. 
a. Speedup (S,) 1s calculated as follows: 


T 
eam 
P 


where 


7; = Endtoend Time on a Single Processor 
T, = Endtoend Time on p Processors 


Nete: Endtoend Time will be the term used to denote the total time to 
execute the program not including the time to read the input file. 


b. Efficiency is calculated by: 


Sp 
any 
where 
Sp = Speedup for p processors 
p = Number of processors 


34 


Three algorithms were developed to study the performance of the parallelization 
of the satellite code. The algorithms were based upon previous work completed by Ford 
and Carvahlo (1993). 

Data was collected for each algorithm; each execution time 1s the result of an 
average of ten recorded run times. 

Analysis was performed on each algorithm's results by comparing each model's 
performance and the use of four, eight, and sixteen nodes to execute the tasks. 

It is important to note that with the use of an open network of computers there is 
undoubtedly going to be fluctuating machine and network loads. Multiple users and other 
competing PVM tasks cause the machine and network loads to change dynamically; thus, 
in order to have sufficient balancing, great care was taken to collect data at times where 
the load on the system was relatively constant. However, due to the fluctuation of open 
networks, the reproduction of the exact data results would be impossible. 

In addition to the system load discussed above, one needs to consider Load 
Balancing. Load Balancing refers to the degree to which all nodes are working to solve 
the problem at hand. There are generally three types of Load Balancing according to 
Geist (1993): 

1. Static Load Balancing 

The problem is divided into separate tasks which are assigned to the 
processors only once. The number or size of each task can be varied 


to utilize different computational powers of machines. 


35 


2. Dynamic Load Balancing by Pool of Tasks 
This is usually used with a Master and Slave scheme, the master continues to 
deal tasks to idle slaves until the task queue is empty. This results in the faster 
processors receiving more tasks. 

3. Dynamic Load Balancing by Coordination 

Typically used by Single Program Multiple Data Stream (SPMD) where each 
processor receives a single set of instructions, receives and manipulates data, 
and redistributes its work at fixed times. 

The second type, Dynamic Load Balancing by a Pool of Tasks, where a Master 
and Slave scheme exists was utilized in this research. 

The Master/Slave approach 1s currently a popular distributed programming 
scheme. The Master starts all the Slave tasks and coordinates their work and 
input/output. All three algorithms developed use a Master/Slave approach. 

Two other distributed programming schemes are the “hostless” Single Program 
Multiple Data (SPMD) and the Functional schemes (Geist, 1993). The "hostless" SPMD 
uses the same program executed on different pieces of the problem; whereas, the 
Functional scheme consists of several programs each one performs a different function in 


the application. 
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B. INPUT DATA 

Approximately 8000 satellites are tracked by the Air Force Space Command 
(AFSPACECOM) in Colorado Springs daily; thus, a file consisting of 8000 satellite entries 
was created. Note that the same near-earth record and deep-space record was copied to 
generate the 8000 input records. 

Each entry or input record consists of twenty-two individual numerical values. 
Table 4-1 on the following page illustrates a typical mput record used. 

Note that the put record used by AFSPACECOM consists of seventeen 
mdrvidual numerical values (see Hoots and Roehrich, 1980, p.91). Table 4-2 on page 39 
illustrates a typical AFSPACECOM record. 

There is a direct correspondence between the first 17 values of the input record 
used in this research and the first 16 values of the AFSPACECOM record. The 
seventeenth entry m the AFSPACECOM record ts the epoch revolutions that have been 
recorded since the object was first launched. Note that this information ts not used to 
calculate the position and velocity vectors of the satellite. 

The entries 18 -22 in Table 4-1 simulate the number of calls made either to SGP4 


or SDP4 per mput record as will be explained later. 
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Table 4-1 Example of Input Record 
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O 
International Year/Launch No./ 193-022B 
designator Piece 
Epoch time Year and day-1st 2 
digits are the year the 
others are the day 
zz 
mean motion dot dot |Mean motion 2nd 
derivative/6 


Example 
dn 


7iBSTAR Drag term (er”) : the -3 |45562- 
is the exponent 
Denotes model : 2 is 
for SGP4 


isles ——*‘iccenticy | 000673 
risfos [Argument of pengee 37.4873 
Ie [Mean motion (reviday) | 15.03410861 


Table 4-2 Typical Input values for AFSPACECOM 
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The entry number 17 in Table 4-1 and entry number 16 in Table 4-2, the mean 
motion (XNO), determines whether or not the satellite is a deep-space object. SGP4 
propagates data for near-earth satellites which require more frequent tracking due to the 
atmospheric drag factor and SDP4 propagates data for the deep-space satellites. 

In order for an object to be classified as deep-space the period must be greater 


than 225 minutes. The period is calculated by 








a rr (Ze). 


For a period greater than 225 minutes XNO must be less than 6.4 since: 


eee ax} ( 2a how 60 min )( Rev ; 
no Roy ee co oo 2 ed) SUS 


Rearrange and solve for XNO : 


1440 min 
XNO a 
= 225 min 


That 1s, 


XNO < 6.422 
day 


Thus, the example in Table 4-1 illustrates a deep-space satellite and the example 


in Table 4-2 ulustrates a near-earth satellite. 
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Out of the 8000 satellite tracked approximately 85 % are near-carth and 15 % are 
deep-space; therefore, 6800 of the 8000 mput records (consisting of 22 elements cach) 

were near-earth and the remaining 1200 records were deep-space. 

The requirement for more frequent tracking of near-carth satellites was simulated 
by requiring 72 calls to the SGP4 subroutine per mput record, resulting in 72 output 
records generated per mput record. If the satellite was deep-space the SDP4 subroutine 
was called 24 times per mput record, resulting m 24 output records generated per input 
record. 72 and 24 was choosen to parallel the work done by Ostrom (1993). The output 
record consisted of the time since the last propagation, three components of the position 
vector, and three components of the velocity vector for a total of 7 output data elements 
per output record. 

To illustrate how this was accomplished, consider the mput record m Table 4-1. 
The difference between the start year and day is one day or 1440 mmutes. The tme step of 
60 minutes/call (over a period of 1440 minutes) resulted m 24 calls to the SDP4 
subroutine. 

C. ALGORITHMS 

1. Overview 

Three algorithms were considered in order to maximize load balancing and 
minimize communication overhead. All three algorithms used PVM to simulate a 2D torus 
topology. A 2D torus is like a 2D mesh with the addition of communication links between 


the nodes located at the "edge" of the mesh. 
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2. METHODS 


a. Sequential 
The Sequential program was developed to be the most efficient obtainable, 


in order to ensure the record of speedup values would not be misleading. 


(1) Sequential Algorithm 


READ DATA FILE 

REPEAT 

CALL PROPAGATION SUBROUTINE 

UNTIL all input records have been converted to position and 


velocity vectors 
COLLECT timing statistics 


The sequential program can be found in Appendix A. 
b. Parallel 
In the following discussion the term "node" will denote one Unix-based 
workstation in a given network;specifically, one SUN microsystem 
SPARC station IPX. 

In order to maximize the load balancing, a dynamic load balancing method by a 
pool of tasks was utilized. One node was designated the "Master" while the other nodes 
became the "Slaves". One of the slave nodes was designated as a collecting node. A 
separate collecting node is an advantage over having the master collect, since collection 
will begin before distribution is complete. This is also similar to the configuration used 


by Phipps (1992) and Ostrom (1993) in their work on parallel orbit prediction on the 


INTEL Hypercube. 
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When four nodes were utilized one node acted as the master and dealt tasks to two 
working nodes to complete. The remaming node acted as the collector by collecting the 
results from the working nodes and returning the results to the master. The research 
conducted by Ford and Carvalho (1993) concluded that a separate collecting node is a 
definite advantage over having the master collect, since collection can begin even before 
the distribution is complete. 

In a similar fashion, when eight nodes were utilized there was a total of 6 working 
nodes and when sixteen nodes were utilized there was a total of fourteen working nodes. 

3. Parallel Algorithms 

a. Answer Back Method (ABM) 

The first approach was to minimize the time a worker spent idle waiting for more 
data. The requirement was that the slave notify the master when it had completed it's tasks 
and was ready for more data. This would result in the fastest workers processing the most 


data. The algorithm for the Master Program 1s as follows: 


READ entre satellite catalog mput file 
ENROLL in PVM and spawn n + 1 slaves 
DESIGNATE I collector and n workers 
REPEAT 
PACK m sets of satellite mput records 
SEND data to worker 
UNTIL each worker has m sets each 
REPEAT 
PACK m sets of satellite mput records 
WAIT until worker sends ready signal 
SEND data to worker 
UNTIL all complete sets of m have been sent 
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REPEAT 
PACK any leftover satellite input records 
WATT until worker sends ready signal 
SEND data to worker 
UNTIL 8000 input records have been sent 
SEND stop signal to workers 
WAIT for program complete signal from collector 
GATHER and compute timing statistics from slaves. 


The algorithm for the Answer Back slave program ts as follows: 


INITIALIZATION 
IF J am the collecting node 
REPEAT 
WAIT for one set of results 
STORE results 
UNTTL. all results have been collected from the workers 
SEND program complete signal to master 
ELSE 
I'm a working node 
REPEAT 
WAIT for data packet from master 
REPEAT 
UNPACK data 
CALL propagation subroutine 
PACK results 
SEND results to the collector 
UNTIL no more input records in the packet 
SEND ready for more data signal to the master 
UNTIL master sends stop signal 
END IF. 


The Answer Back program can be found in Appendix A. 
b. Successive Deal Methods 
The second and third algorithms were developed to decrease the communication 


time between the master and slaves. The input records were dealt to the workers in sets 


m at a time. After giving each worker an imitial set, the master continued to deal input 
records until all 8000 records had been sent. 

The successive deal methods are basically the same, the difference lies m the way 
the mput data 1s dealt to each worker. 

In the second algorithm (Successtve Deal Model I), to study the result of sending 
larger data packets, each worker 1s dealt an mput data set consisting of m records with 22 
elements each. Next, 1/(2“p) of the remaming records are dealt to each worker. Finally, 
1/p of the remaining records is dealt to each worker. Note that if any records are leftover as 


a result of the integer division, the leftovers are sent last. For example, if 


n = number of data records 

m = number of records sent simultaneously 
p = number of working processors or nodes 
s = sets of m records to be distnbuted. 


and we let, n= 8000 


m = 15 
p=2 
Then, the number of sets to be distributed is s = 8000 records _ 533 sets of 15 
15 records/set 


with 5 mput records leftover. Now, a set is sent to each worker leaving a total of 531 sets 
left to be distributed. Next, 1/(2*p) records are dealt to each worker; that is, 


(1 ) 
\2*p/) 





e ($31 sets) = 132 sets are grven to each worker. 


Thus, the number of sets left to be distributed 1s 


s = $31 - (2*132) = 267 sets. 
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Next, 1/p records are dealt to each worker, that ts , (1/2)*267 sets = 133 sets are 
distributed leaving 1 set leftover. Finally, the leftovers are sent to a worker and all the 


input records have been distributed. 
In the third algorithm, the Successtve Deal Model II, the master deals out one set 


consisting of m input records to each worker. Then, the master continues to deal out data 
sets until all the records have been distributed. For example, using the variables defined 


above, let 


n = 8000 

m= 15 

paw 
then, 


_ 8000records _ 
S = greene 533 sets + 5 records leftover. 


First, one set is given to each worker, resulting i 531 sets left. Then, the sets would be 
distributed, one at a time, first to one worker and then to the other worker. Last, the 
leftover records are sent. 

(1) Successive Deal Method I (SDI) Algorithm 


Master Algorithm 


READ entire satellite catalog mput file 
ENROLL m PVM and spawn n + 1 slaves 
DESIGNATE 1 collector and n workers 
REPEAT 

PACK one set of m input records 

SEND data to worker 
UNTIL each worker has one set 
REPEAT 

PACK 1/(2*p) records 

SEND data to worker 
UNTIL each worker has a packet 
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REPEAT 
PACK remaining sets 
SEND data to worker 
UNTIL each worker has a equal packet 
REPEAT 
PACK leftovers 
SEND leftovers 
UNTILL all input records have been sent 
SEND stop signal to workers 
WAIT for program complete signal! from collector 
GATHER and compute timing statistics from slaves. 


Slave Algorithm: 


INITIALIZATION 
IF I am the collecting node 
REPEAT 
WAIT for one set of results 
STORE results 
UNTIL all results have been collected from the workers 
SEND program complete signal to master 
PESE 
I'm a working node 
REPEAT 
WAIT for data packet from master 
REPEAT 
UNPACK data 
CALL propagation subroutine 
PACK results 
SEND results to the collector 
UNTIL no more input records in the packet 
UNTIL master sends stop signal 


END IF 
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(2) Successive Deal Model II (SDI) Algorithm 
Master Algorithm: 


READ entire satellite catalog input file 
ENROLL in PVM and spawn n + 1 slaves 
DESIGNATE 1 collector and n workers 
REPEAT 
PACK one set of m input records 
SEND one set to each worker 
UNTIL each worker has one set 
REPEAT 
PACK m sets of input records 
SEND data to worker 
UNTIL all m sets have been distributed 
REPEAT 
PACK remaining input records 
SEND data to worker 
UNTILL all input records have been distributed 
SEND stop signal to workers 
WAIT for program complete signal from collector 
GATHER and compute timing statistics from slaves. 


Slave Algorithm: 


INITIALIZATION 
IF I am the collecting node 
REPEAT 
WAIT for one set of results 
STORE results 
UNTILL all results have been collected from the workers 
SEND program complete signal to master 
ELSE 
I'm a working node 
REPEAT 
WAIT for data packet from master 
REPEAT 
UNPACK data 
CALL propagation subroutine 
PACK results 
SEND results to the collector 
UNTIL no more input records in the packet 
UNTIL master sends stop signal 


END IF. 
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For the source code of the algorithms discussed above see Appendix A. The 
programs developed were written in C. The SGP4 code is written in FORTRAN. The 
C framework using a PVM architecture calling a FORTRAN satellite propagation 
subroutine was successful. 

D. PROGRAM OVERVIEW 


1. Sequential 
The sequential version was executed 10 times and the total run times were 


averaged. This was done four times and the four average values were averaged resulting 
in a sequential tme 7), which is used m the calculation of speedup. 

The total time for the program to execute did not include the initial time to read the 
entire mput catalog because this was done one time only at the beginning of each program. 
From this point on the total time to execute the program , excluding readtime will be 
called endtoend time. The sequential average endtoend time was used 
im the calculation of speedup which will be discussed in the Parallel section below. 

2. Parallel 

In each program discussed under the Parallel Algonthm section above, time clocks 
were imserted at various locations mn order to measure the time to read the entire input 
catalog, the endtoend time, the worker's communication time, and the worker's calculation 


time. 
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The number of satellite input records (consisting of the 22 mput values) sent 
simultaneously to each worker was chosen to be either 5, 10, 15, 20, 25, 30, 35, 40, 45, 
50, or 55. This was based upon previous work done by Ford and Carvalho (1993). 

The number of nodes utilized was 4, 8, or 16. To configure the personal parallel 
virtual machine, a list of names of the Unix-based machines used was listed m a file called 
hostfile. When PVM was started by the command pvmd3 hostfile & , the hostfile was 
automatically read and the machines were ready to act as nodes m a parallel application. 

The machine from which the application was started acted as the master and the 
slave nodes were spawned by first specifying the number of nodes desired (num_ nodes) 
and then executing the statement 

num = pym_spawn(SLAVENAME, (char**) 0, 0, "", num_nodes, tids). 
The selection of 4, 8, or 16 nodes was based upon previous work done by Ostrom (1993) 
in the parallelization of the SGP4 code using the Naval Postgraduate School INTEL 
iPSC/2 Hypercube. This 1s a Multiple Instruction stream, Multiple Data stream (MIMD) 
multicomputer. It consists of a system resource manager called the host, and eight 
individual processors, referred to as nodes. 

Data for each set of choices discussed above was collected for ten iterations of the 


entire program and these results were averaged. 
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a. Analysis 
For endtoend time, percent worker communication, speedup, and 
efficiency, two comparisons were analyzed to measure the performance 
of each algorithm: 
(1) For a given algorithm, the performance of four, eight, and sixteen 
nodes utilized was compared and 
(2) For a given number of nodes, the three algorithm's performance 
was compared. 
For both cases above the number of satellite mput records sent 
simultaneously was either 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, or 55. 
It is important to note that for all cases, the same input record was utilized; 
thus, for all three models the number of calls made to SGP4 and SDP4 was the same. 
E. RESULTS 
1. Read Time 
The time to read the data file (consisting of 8000 records ) varied from 
approximately 39 seconds to 1100 seconds. Thus, the readtume was extremely dependent 
of the load on the system at the time the data file was read. This was in contrast to the 
results found by Ford and Carvahio (1993); the number of mput records used in their 


research was 630 and the read time was approximately 5 seconds for each execution. 
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2. Endtoend Time 
The endtoend time 1s the most important time considered because it 1s a reflection 
of the total performance of each algorithm designed. 

a. Method Comparison 

For 4 and 8 nodes, the optimal performance was achieved by the Answer Back 
Method (ABM). For 16 nodes , with the exception of sending 15, 50, or 55 records at a 
time the ABM was superior. That is, when sending 5, 10, 20, 25, 30, 35, 40, and 45 
records simultaneously, the ABM produced the fastest times. 

From this point on in this analysis, when a given algorithm is superior the 
majority of the cases (as shown above) the term "in general" will be used. For the case 
above, one would say "When 16 nodes were utilized, in general, the Answer Back Method 
(ABM) was the best." The following graphs illustrate these results: 


(1) Using four nodes the Answer Back Method was the fastest: 


10 15 20 2% WwW % 40 45 §0 = §5 
Number of Satellite Input Records Sent Simultaneously 





Figure 4.1 Four Node Comparison of Models 
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(2) Using eight nodes the Answer Back Method was the fastest. 
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Figure 4.2 Eight Node Comparison of Models 


(3) Using sixteen nodes, in general, the Answer Back Method was fastest. 


Endtoend Time (sec) 
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Figure 4.3 Sixteen Node Comparison of Models 
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b. Node Comparison 
For the analysis comparing the performance of various choices of nodes for a grven 
algorithm the following conclusions can be made: 
(1) For the Answer Back Method, a choice of eight nodes was _ the best; 


closely followed by sixteen nodes. 
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Figure 4.4 Answer Back Model Node Comparison 
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(2) For the Successive Deal L, a choice of sixteen nodes is superior. 
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Figure 4.5 Successive Deal Method I - Node Comparison 


(3) For the Successive Deal Method IL, a choice of sixteen nodes ts 
superior. It is not surprising that sixteen nodes is the best choice for both Successive Deal 
Methods because both algorithms are very similar; in general, one can note that the number 
of nodes utilized should decrease the endtoend time. The Successive Deal Method I 


results can be seen on the next page, Figure 4.6. 
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Figure 4.6 Successive Deal Method IT - Node Comparison 


It is interesting to note that for the Answer Back Model utilizing eight nodes was 
superior over sixteen nodes for all cases. This could be attributed to the fact that with 
sixteen nodes the communication time (which was naturally greater in the Answer Back 
Model) between the master and slaves decreased the advantages of parallelization; whereas, 
with eight nodes the advantages of parallelization outweighed the disadvantage of the 
communication time between the master and the slaves. 

3. Percent Worker Communication 

As one can see from the analysis above, communication time is an important factor 
in the performance of a given algorithm. 
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In "PVM Concurrent Computing System: Evolution, Experiences, and Trends” 
Sunderman, Geist, and Mancheck (p. 7, 1993) state that PVM normally operates in a 
general purpose networked environment and as a result, raw performance or speedup of a 
given application 1s hard to measure. They go on to state that "in such a scenario, most of 
the focus 1s on communications overhead.” 

With communications overhead m mind, the time each worker spent 
communicating versus the time spent calculating was evaluated. Using average values, the 
percent of time the worker communicates was calculated as follows: 

% Worker Communication Time = SS - Ginkea ie © (100%) . 

The goal was to increase the amount of time a worker spent calculating and 
decrease the time a worker spent communicating, resulting in a small communicahon 
overhead. 

a. Model Comparison 
For a given number of nodes, the performance of the three models in terms 
of communication overhead was evaluated and the results are as follows: 
(1) Utilizing four nodes for each model produced varied results; m general, 
the ABM and the SDII were the best choices. The minimum percent worker 
communication time was attained by the SDI Method when sending 35 satellite mput 


records at a time. 
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Figure 4.7 Percent Worker Communication For Each Model Using 4 Nodes 
(2) When utilizing eight nodes, both Successtve Deal Models were, in 
general, superior over the Answer Back Model. The mmimum percent worker 


communication was attained by SDI when sending 55 satellite mput records at a time. 
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Figure 4.8 Percent Worker Communication For Each Model Using 8 Nodes 
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(3) When uhlizing sixteen nodes, again the Successive Deal Methods were 
superior over the Answer Back Method. The minimum percent worker communication 


was attained by the SDI when sending 35 input records at a time. 
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Figure 4.9 Percent Worker Communication For Each Model Using 16 Nodes 
The Successtve Deal II proved to be the best choice mm terms of communication 
overhead. The Answer Back Method required the additional communication between the 
master and slaves which increased the communication overhead. The Successive Deal I 
message size was significantly larger, producing slightly inferior results than the Successive 
Deal I which continually dealt out small packets of data. 
b. Node Comparison 
For a given algorithm, the performance of four, eight, and sixteen nodes was 


evaluated in terms of communication overhead. The results are as follows: 
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(1) For the ABM, the utihzation of 4 nodes was superior. 
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Figure 4.10 ABM Percent Worker Communication - Node Comparison 


(2) For the SDL, in general, the utilization of 4 nodes was the best. 
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Figure 4.11 SDI Percent Worker Communication - Node Comparison 


(3) For SDM the use of four or eight nodes was the best choice. 
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Figure 4.12 SDIIT Percent Worker Communication - Node Comparison 


These results are not surprising due to the fact that for a grven algorithm each 
worker's calculation time is approximately constant (since they all utilize the same mput 
record) and the communication time between the master and slaves is reduced when there 
are fewer slaves. 

4, Speedup 

As mentioned earlier, in a general purpose network environment, speedup 1s hard to 


measure with a great deal of confidence. 
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Recall, speedup (5S, ) is calculated as follows: 


Sp= 


where 7, = Endtoend Time on a Single Processor 
T, = Endtoend Time on p Processors 


Ideally, the speedup equals "p" the number of processors; however, due to 
communication costs, sequential bottlenecks, and computational tasks not necessary on a 
single processor the speedup is less than "p”. 

With the hmitations of speedup results discussed above in mind, the following 
results were found to be true. 

a. Model Comparison 


(1) Utilizing four nodes for each model, the ABM was superior. 


~ § 10 1% 202 B&B DW B® 4 4 80 5 
Number of Gatellite input Records Sent Simul aneousty 





Figure 4.13 Speedup Model Comparison When Using Four Nodes 
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(2) Utilizing Eight Nodes for each model, the ABM was superior. 
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Figure 4.14 Speedup Model Comparison When Using Eight Nodes 


(3) Utilizing sixteen nodes, the ABM, in general, was the best. 
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Figure 4.15 Speedup Model Comparison When Using Sixteen Nodes 
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b. Node Comparison 


(1) For the Answer Back Model using 8 or 16 nodes was superior. 
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Figure 4.16 Answer Back Model Speedup 


(2) For the Successive Deal I the use of 16 nodes was superior. 
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Figure 4.17 Successive Deal I Speedup 
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(3) For the Successive Deal I, utilizing 16 nodes was superior. 
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Figure 4.18 Successive Deal II Speedup 
These speedup results are directly related to endtoend performance. If one 
compares figures 4.4-4.6, the endtoend times for each model, and the figures 4.16-4.18 of 
speedups above an inverse relationship is noted. 


5. Efficiency 
Sp 


Recall, Efficiency = FE = > 


where S, = Speedup for p processors 
p = Number of processors 


Thus, the efficiency is a measure of the speedup per processor or how close the 
actual speedup is to the theoretical speedup (p). The efficiency was evaluated in terms of a 
comparison of models and a comparison of the node performance for a grven model. The 


results are as follows: 
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a. Model Comparison 


(1) Utilizing 4 nodes the Answer Back Model was superior. 
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Figure 4.19 Four Node Efficiency Model Comparison 


(2) Utthizing 8 nodes, the Answer Back Model was Superior. 
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Figure 4.20 Eight Node Efficiency Model Comparison 
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(3) For sixteen nodes, there was a large fluctuation for all models; however, 
m general the Answer Back Model was the best choice. 


b. Node Comparison 


(1) For the ABM, the utilization of 4 or 8 nodes was superior. 
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Figure 4.21 Answer Back Model Efficiency 


(2) For the SDI using 4 or 8 nodes was the best choice. 
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Figure 4.22 Successive Deal I Efficiency 
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(3) For Successive Deal I , using 4 or 8 nodes was the best choice. 


0.15 a 


0.1 
§ 10 15 20 2% 30 3 40 4 SO 


Number of Satellite Input Records Sent Simultaneously 





Figure 4.23 Successive Deal II Efficiency 

It is important to note that with the use of an open network, there are great 
flucuations in the amount of trme taken to perform a given task. The execution time 
depends on the number of jabeaitas users and the percentage of the CPU allocated to 
each user. For example, if one user is running a large apphcation on a given station and 
another user is using this same station for PVM applications, the execution time will 
be increased. 

In conclusion, considering all factors discussed above, the Answer Back Model 
was the best algorithm. When using four, eight, or sixteen nodes, the Answer Back Method 
produced the best Endtoend times, Speedups, and Efficiencies for all size data packets 


distributed at one tme. 
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The fastest tme resulted with the ABM using eight nodes and sending five satellite 
input records at a time. The utilization of 8 nodes gives the maximum parallelization 
advantage and the minimum communication overhead. The Answer Back Method required 
the slaves to notify the master when ready for more data , this reduced the time spent 
waiting for data; additionally, the fastest workers were the ones that processed more data. 

In terms of communication overhead, the Successtve Deal I] Method was superior 
to the Successive Deal I and the ABM. The SDI did not have the added communication 
between the Master and Slaves that was mherent in the Answer Back Method. 

No conclusions can be made regarding the best size data packet to send because 
although sending five input records at a time resulted in the best endtoend time of 73.42 
seconds the endtoend time when sending fifty-five records resulted in an endtoend time of 
74.85 seconds. Further research would need to be conducted to provide conclusive results 


on the optimal size data packet to be distributed. 
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V. CONCLUSIONS 


The goal of this thesis is to illustrate how a network of computer workstations is 
used as a parallel computer to solve a military requirement of tracking 8000 satellites 
daily. 

The Air Force Space Command (AFSPACECOM) satellite computer code ran 
approximately 2.6 times faster by the parallelization of the code implemented on the 
Parallel Virtual Machine (PVM) using 8 workstations. PVM is a small software package 
(~ Mbyte of C source code) that allows a network of computers to appear as a 
distributed-memory parallel computer. 

Many scientists do not use their workstations all the time and when applications 
are to be run may need more power than a single workstation can provide. The cost of 
allocating large computing resources to each user is rising daily; thus, the use of PVM or 
a similar product will be standard in the future. 

For military applications, this work illustrates how to use PVM to track satellites 
using ordinary workstations. A Naval PVM application would be to use a system of 
workstations located at various enclaves in the ship to track and destroy incoming threats. 
If the ship took a direct hit in one of its enclaves, the crew would be able to choose 
unaffected workstations to continue computing power; thus, reducing the vulnerability of 
the ship. 

The AFSPACECOM's Simplified General Perturbation Model Four (SGP4) has 


been the operational theory since 1976. The SGP4 model uses six classical orbital 
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elements, a drag factor, and an epoch time to predict a satellite's position and velocity at 
a future time. 

The SGP4 and it's extension, SDP4, are both analytical models. Although the 
solutions are not as accurate as numerical techniques, they are computationally less 
expensive. A detailed discussion of the SGP4's mathematical theory can be found in 
Chapter IIT. 

Currently, D.A. Danielson and B. Neta at the Naval Postgraduate School are 
documenting and testing a semi-analytical satellite motion model developed by Draper 
Lab. This will increase the accuracy while decreasing the computational cost. See 
documentation by Danielson, Early, and Neta (1993) and numerical experiments 
comparing the semi-analytics to numerical and analytical models by Dyer (1993). 

Three algorithms were developed to parallelize the AFSPACECOM code and the 
performance of each algorithm was tested. All three algorithms use a Master and Slave 
approach with a separate collector to collect the results and send them back to the 
Master. The Master distributes the data to the Slaves. The Slaves perform all the 
calculations necessary to produce the position and velocity vectors for each satellite. The 
algorithms differed in the manner in which the data is distributed. Each algorithm is 
tested using four, eight, and sixteen workstations. 

The algorithm that required the Slaves to notify the Master when ready for more 
data resulted in the best times, this method 1s called the Answer Back Method or ABM. 


In the ABM, there was less time spent by the Slaves waiting for more data to process 
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which resulted in the fastest workers processing the most data. When using four, eight, or 
sixteen workstations, the ABM produced the best total tmes, speedups, and efficiencies. 

One area of further research would include the use of more than sixteen 
workstations and an algorithm designed to reduce the bottleneck created by the collecting 
node. Perhaps, the use of two or more collectors would be advantageous. Additionally, 
further research should be conducted to provide conclusive results on the optimal size 
data packet to be distributed. 

Some of the curves exhibit large fluctuations, this is probably due to changes in 
the number of users on the system at the time the data was collected. Further research 
should be conducted to test if the results are reproducible to some extent. 

The effect of writing the results to an output file was not considered in this 
research. Any research conducted in the future should examine the results produced when 
including the time required to write to an output file. 

In conclusion, the result of this thesis confirms that PVM can be used to track 
orbiting-earth satellites. The use of workstations for parallel processing uses untapped 
power and decreases the amount of computational time required. As the number of 
objects to be tracked and the computational power required increases this work will 


become increasingly more important. 


APPENDIX A : SOURCE CODE 


Pere ENE KAR AER KKK EEK AERER EKER ERERREEAKEKKEKKKEREKEKEKREKEKEKEKEKEKKKEKE 


* sat_master_ab.c LAST UPDATE: Oct 5 1993 * 
LT S.K. Brewer * 
This is the master program for the Answer Back Method. It uses PVM i 
to simulate a 2D torus of processors;n+l slaves are spawned, of 
which n are working nodes and 1 is the collecting node. * 
Satellite data is issued to the workers in “Answer Back" fashion, ~ 
sending new data to a working node only when the node is ready. i 
Timing data, collecting for statistical purposes only, are placed in * 
the file “timing.ans“ which will be placed in the directory from = 
* 

] 


which this master program is invoked. 
KKRKKKKEKKEKKKEKKKEKE KKK KKK KKKKEKKEKEKKKKEKEK KKK KKEKKEKEKEKEKKKEKKKKKKKKKEKKKKKKKKEK 


* 
* 
* 
* 
* 
* 
* 
* 
* 
* 
#include <stdio.h> /* INCLUDE STANDARD I/O FUNCTIONS */ 
#include "“pvm3.h" /* INCLUDE PVM FUNCTIONS ay 
#include <sys/time.h> 

#include <time.h> 

#include <math.h> 

#include <sys/types.h> 


#define SLAVENAME “at.run" 


int main(argc, argv) /* GET FILE NAME FROM COMMAND LINE ro 
merit argc; 

char *argv[]; 

{ 

Hehe num_nodes=3; /* NUMBER OF SLAVE NODES me 

see num_satdata=15; /* input data records distributed */ 

ihe num_elements=22; /* NUMBER OF elements in each data record */ 
double sat [10000] [22]; /* ARRAY OF satellite input records */ 

seen te its,nod,size,delta=5; 

BPR Huneenvetdanni=0,.9 4), k, tids[32), msqtag, reading=1; 

int numsat=0, collector, leftover, worker, sets, work_nodes,done=0; 
StEruct timeval ts[4]; /* Number of time stamps */ 

int who; 

float endtoend,tcomm, average=0.0,avcoll=0.0,avcomm=0.0,avcalc=0.0; 
float cmtime, commtime, cctime, calctime, readtime, c_comm,avpcm=0.0; 
float avpel=0.0,,aa=0 0; 

FILE ey te eae te 1 mar); 

int msgtag99=99; 


Geeeimeotday (&ts[0], (struct timeval *)0);/* BEGIN READING DATA FILE */ 


7/* OPEN DATA FILE */ 


Mime@otmrale = fopen(argv[1}], “r“)}) == NULL) 
{ Paihict (ania te-="ss did not open\n", argv[1}); 
exit(l1); 


) 
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as READ ENTIRE DATA FILE AT ONCE ar 
while(reading != EOF) 
( if ((reading = fscanf(infile, "%1f“, &sat[numsat](0])}) "= see 
for (j=l; j<num_elements; ++)3) 
fscanf(infile, “%lf", &sat[{numsat][j]); 
numsat=numsat+1; /* COUNT NUMBER OF SATELLITES IN DATA FILE */ 
) 
fclose(infile); 
numsat=numsat-1; 


gettimeofday (&ts[1], (struct timeval *)0); /* END READING DATA FILE */ 
/* SET UP FILE FOR TIMING STATISTICS ay 
timing = fopen (“taming ans... a): 


readtime = (ts[1].tv_sec-ts[0].tv_sec)*1000000+ts[1] .tv_usec-ts[0) .tv_usec; 
fprintf(timing, "Time to read data file = *ld microseconds\n", readtime); 
for(size=0; size<55; size +=delta) 

{ 

num_satdata = size + 5; 

for (nod=0 >" nod<3> 4++noed) 

{ 

if(nod == 0) 


num_nodes = 3; 
else 
if (noc —aaia 
num_nodes = 7; 

else num_nodes = 15; 

fprintf(timing, “sats,nodes, endtoend collector_comm 
worker_comm worker_calc\n"); 

forintl(eaimind, 2a $d\n",num_satdata,num_nodes) ; 


for(its=0; its<l; ++its) 

{ 

gettimeofday(&ts[2], (struct timeval *)0);/* BEGIN END TO END TIME*/ 
TTD tS Se! ENROLL IN PVM ee ee 


mytid = pvm_mytid(); 


/* START UP SLAVE TASKS hy. 
num=pvm_spawn (SLAVENAME, (char**)0, 0, "", num_nodes, tids); 
collector=tids| 0); 


i SEND SLAVES THIER INDICES INTO THE TID ARRAY */ 
msgtag—l- 
for (1=0; i<num_nodes; ++1) 
( pyvm_initsend (PvmDataRaw) ; 
pyvm_pkint (&1,1,1); 


if (1==0) 
pyvm_pkint (&numsat, 1, 1); /* TELL COLLECTOR NUMBER OF SATSuey 
else 


pyvm_pkint (&collector, 1, 1);/* TELL WORKERS COLLECTOR’S ADDRESS*/ 
pyvm_send( tids[i], msgtag); 
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/* SEND SETS OF SATELLITE DATA TO WORKERS, WAITING FOR ANSWER BACK */ 
msgtag=2; 
<0 
work_nodes=num_nodes-1; 
sets=numsat/num_satdata; 
leftover=numsat-sets*num_satdata; 
i-0; 
for (j=1;j<num_nodes; ++j) /* DEAL ONE SET OF SATELLITES TO EACH WORKER */ 
{ pyvm_initsend(PvmDataRaw) ; 

pyvm_pkint (&num_satdata,1,1); 

for (k=0; K<num_satdata; ++k) 

{ pvm_pkdouble(sat{i], num_elements,1); 

ls=i+l; 

} 

pyvm_send(tids[j]}], msgtag); 

sets=sets-l1; 
} 
while(sets>0) /* DEAL REMAINING SETS TO WORKERS AS THE NODES BECOME FREE */ 
{ pyvm_initsend(PvmDataRaw) ; 

pyvm_pkint (&num_satdata,1,1); 

for (k=0; k<num_satdata; ++k) 

{ pvm_pkdouble(sat[1i]}], num_elements,1); 

l=i+l; 

} 

sets=sets-l; 

pyvm_recv(-1, msgtag99); 

pvm_upkint (&who,1,1); 

pvm_send(tids[who],msgtag); 
) 
if (leftover>0) /* SEND LEFTOVERS TO WHOEVER IS READY NEXT */ 
{ pyvm_initsend(PvmDataRaw) ; 

pyvm_pkint (&leftover,1,1); 

Per (KkKa0- = kaleftover; ++k} 

{ pvm_pkdouble(sat[1]}], num_elements,1); 

1=i4+l1: 

} 

pvm_recv(-1,msgtag99); 

pvm_upkint (&who,1,1); 

pvm_send(tids([who],msgtag) ; 
} 


pyvm_initsend(PvmDataRaw) ; 
pyvm_pkint (&done, 1, 1); /*TELL WORKERS NO MORE DATA IS COMING*/ 
pyvm_mcast(tids, num_nodes, msgtag)j; 


msgtag=5;/* RECEIVE PROGRAM COMPLETE SIGNAL FROM COLLECTOR */ 
pvm_recv(-1,msgtag) ; 

yo COMPLETE END TO END TIME af 

gettimeofday(&ts[(3], (struct timeval *)0); 


TS 


/* GATHER TIMING STATISTICS FROM SLAVES */ 
msgtag=4; 
for (1=0; i<num_nodes; +41) 
{ pyvm_recv(-1,msgtag) ; 
PYVMIUpK Int (awhe, 1 
if (who == 0) /* TIMES FROM COLLECTORS =, 
{ 
pym_upklong (&c_comm,1,1); 
) 
else 
/* TIMES FROM WORKERS */ 


pyvm_upklong(&cctime,1,1); 
calctime=calctime+cctime; 
pyvm_upklong(&cmtime,1,1); 
commt ime=commtime+cmtime; 


} 
) 
pyvm_exit(); 
/* COMPUTE OVERALL TIMING STATISTICS */ 
endtoend=(float) (ts[3].tv_sec-ts[2] .tv_sec) *1000000+ 
(float)ts[{3]) .tv_usec-{(Eloat) ts[(2] ev msee- 
/*convert to seconds*/ 
c_comm=c_comm/1.0E6; 
endtoend=endtoend/1.0E6; 
commtime=commtime/1.0E6; 
calctime=calctime/ 1. 0EsG- 
Lr TO Laie) UME 7 


average = average + endtoend; 
aveoll = avcoll + c_comm; /*collector communication Emme 
avcomm = avcomm + commtime; /*worker communication time*/ 
avcalce = avcale + calctime; /*worker calculation time 
EPH INGE (Ue imind. a. 6.25 26.26 

Seat %6.2f\n", endtoend,c_comm, commtime, calctime) ; 


average = average/its; 


aveoll = avcoll/its: 
aveoomm = avcomm/its; 
aveale = -aveale/its-: 


avpcm=avcomm/ (num_nodes-1) ; 

avpcl=avcalc/ (num_nodes-1) ; 

aa=(avpcem/avpcl) *100; 

/* print results to output file - not shown in this code */ 


fclose(timing) ; 
printf("ENTIRE SEQUENCE COMPLETE"); 
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* * 
* sat_slave_ab.c LAST UPDATE: 05 OCT 1993 i 
= Susan Brewer * 
* This is the slave program for the Answer Back Method. * 
* It uses PVM to simulate a 2D torus of processors. * 
* The slave with index 0 will be the collecting node. * 
* This program “answers back" for more data. . 
* The Fortran sub-routine “sgp4m"“is called to perform the 
peecalculations for orbit prediction . 
eee a Oe te RR Re REAR A RAR AERA WKS / 
#include “*pvm3.h" /* INCLUDE PVM FUNCTIONS no: 
#include <stdio.h> 
#include <sys/time.h> 
#include <time-.h> 
#include <math.h> 
#include <sys/types.h> 
main () 
{ 
double FPesules(7*10041)> /* ARRAY OF RESULTS */ 
ann © num_elements=22; 7 BPIELDS IN INPUT SATELLITE RECORD */ 
double sat_data[22]; /* ONE SATELLITE INPUT RECORD a 
eric max=8000, sats=l; 
pent sat_no; 
Dirt 1 elt 2 Pentti /* COUNTERS ao 
ete ba asi 2 | j/* ARRAY OF PROCESSOR IDS “es 
Ligne mytid, numnode; /* MY PROCESSOR ID a7 
int me, collector; 7 * MY INDEX INTO THE TIDS ARRAY a7 
te master,msgtag, msgtag2=2, msgtag3=3, msgtag99=99; 
Seruct timeval ts[4]; 
Tera res_sets=0; 
float S=0), 0,-0.0, cCocaleime, calc, conm;: 
extern sgop4m— ();> /* EXTERNAL SUB-ROUTINE FOR ORBIT PREDICTION */ 
figeicd = pvm mytid( )-; /* ENROLL IN PVM a 
master=pvm_parent (); 
Pee RSECHRIVE MY INDEX AND COLLECTOR’S TID FROM MASTER oy, 
gettimeofday (&ts[0j], (struct timeval *)0); 
msgtag = 1; 
Evie recy(--1, msgtag ); 
pyvm_upkint(&me, 1, 1); 7 * “GET My INDEX IN THE ARRAY OF TIDs: */ 
Bvmeurpking (<collector, 1, 1); j/* GEE THE “GOLLECTING NODES TID*/ 
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Lf (Mee —— o> /* IF I AM THE COLLECTING NODE: bse 
{ 
for(i=0; i< max; ++1) 
{ 
pyvm_recv( -1, msgtag3) ; 
pyvmM_UpkKint (&sat_ no; 1) 1)? - RECEIVE RESUEL SElowea 
pvm_upkint (&r-lengeh, 1). le 
pyvm_upkdouble(results, r_length, 1); 
} 
msgtag=5; /* TELL MASTER ALL RESULTS HAVE BEEN received */ 
pyvm_initsend(PvmDataRaw) ; 
pvm_send(master, msgtag); 
} 
else /* I£ I AM A WORKING NODE: 7) 
{ 
while(sats>0) /* REPEAT UNTIL MASTER SENDS DONE SIGNAL 7 
{ pvm_recv(-1, msgtag2); 
pvm_upkint(&sats, 1, 1); 
for (i=0- i<sats- +41) 
{ pvm_upkdouble(sat_data, num_elements ,1); 
sat_no=(int)sat_data[1]; 
gettimeofday (&ts[2], (struct timeval *)0); 
sgp4m_ (sat_data, results); /* CALL SUB-ROUTINE*/ 
gettimeofday (&ts[3], (struct timeval *)0); 
s=s+¢ts[3].tv_sec-ts[2].tv_sec; 
u=u+ts[3] .tv_usec-ts[2].tv_usec; 
r_length=7* (int)results[0]+1; /* NUMBER OF RESULTS RECORDS */ 
pvm_initsend(PvmDataRaw) ; 
pyvm_pkint( &satono, 1, 1 9):>) 7" SATELLITE NUMBERS 
PYMePKint i ers lenqrare Ll) 
pyvm_pkdouble( resulles, rolength, 1) 4/7" PACK = 
pvm_send(collector, msgtag3) ; /* SENDS 7 
pvm_initsend(PvmDataRaw); /*TELL MASTER I’M READY FOR MORE DATA */ 
PVM_Pkhink (eme nae) 
pvm_send(master, msgtag99Q) ; 
} 
}/* TIMING STATISTICS TO BE SENT TO MASTER */ 
gettimeofday (&ts[1]), (struct timeval *)0); 
totaltime=(float) (ts(1l|-<tv see-tslUl tev see)  10G0ClG. 
(float)ts[1).tv_usec-(float)ts[0).tv_usec; 
calc = s*1000000 4+ u; 
comm = totaltime - calc; 
msgtag=4; 
pyvm_initsend (PvmDataRaw) ; 
pyvm_pkint(&me, 1,1); 
1£(me == 0) 
{ 
pyvm_pklong(&totaltime,1,1); 
} 
else 
{ 
pyvm_ pkliong(&cale, ii) -pvmork long (.comnmna ss, 
} 


pvm_send(master,msgtag) ; 
pyvm_exit(); 
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* * 
* sat_master_SDI.c LAST UPDATE: Oct 12 1993 * 
Ci LT S.K. BREWER x 
* This is the master program for the Successive Deal Method I. a 
* It uses PVM to simulate a 2D torus of processors;n+l1 slaves _ 
* are spawned, of which n are working nodes and 1 is the * 
* collecting node. Satellite data is issued to the workers by = 
first dealing one data package (num_satdata) to each worker, " 
*then deal 1/(2*working nodes)times the number of data sets * 
* left(num_sets) .Followed by a final deal of equal packets to x 
*each worker. Any leftover records are sent last. Timing data, x 
*collecting for statistical purposes only, are placed in the ‘ 
* file “timing” which will be placed in the directory from which * 
* this master program is invoked. * 
Sen OROR gece er etek aren oS Rh ee A EAE EAR RRA RARER ARE 
#include <stdio.h> /* INCLUDE STANDARD I/O FUNCTIONS ny 
#include “pvm3.h" /* INCLUDE PVM FUNCTIONS a, 
#include <sys/time.h> 
#include <time.h> 
#include <math.h> 
#include <sys/types.h> 
#define SLAVENAME “t.run" 
int main(arge, argv) /* GET FILE NAME FROM COMMAND LINE oo 
lpia argc; 
char Saiecit al) 3 
{ 
ant num_nodes; /* NUMBER OF SLAVE NODES */ 
anit num_satdata; /* NUMBER OF input data records*/ 
int num_elements=22; /* NUMBER OF elements <7 
double sat[10000) [22]; Joe BARRY 
int 1ts,nod,size,delta=5:; 
int Hii Cle. T-0, 9 )=0 .:Kko0,%s=0,) ids 32)],-msgtag; 
Ant nuMmsac=-0, collector, reading=1-; 
int leftover=0,setsleft=0,worker=0, sets=0,num_sets=0; 
int work_nodes=0, done=0; 
Struct timeval ts[4]; /* Time Stamps required */ 
nr) © who; 
float enaeoena- 040, tecomn-0.0,average=0 .0,avcoll=0.0; 
float aveomu—U 0 ,avecale= 0-0 -clocomm, avpcm=0-0, avpel=0:0,aa=0.0; 
float cmtime, commtime, cctime, calctime, readtime; 
FILE Sede ta Kya: 


/* BEGIN READING DATA FILE */ 
gettimeofday (&ts[0], (struct timeval *)0); 
J/* OPEN DATA FILE  */ 


iu (nrawe — 9 Fopen(argyv( 1); “r")) == NULL) 
{ Petit ttintrile = 2s did not open \n .. argv|1}.); 
exit(l); 
) 


a3 


ee READ ENTIRE DATA FILE AT ONCE a 
while(reading != EOF) 
{if ((reading = fscanf(infile, “%tl1f%, &satf[numsat]) [0])) Y=sEoR 
for (j=l; j<num_elements; ++j) 
fscanf(infile, "%lf", &sat[numsat][j]); 
numsat=numsat+1; /* NUMBER OF SATELLITES IN DATA FILE */ 
} 
fclose(infile) ; 
numsat=numsat-1; 
/* END READING DATA FILE */ 
gettimeofday (&ts[1], (struct timeval *)0); 
/* SET UP FILE FOR TIMING STATISTICS */ 
timing = fopen (“timing - weecae 
readtime = (ts[{1].tv_sec-ts[0].tv_sec) *1000000+ 
ts[{1].tv_usec-ts[0].tv_usec; 
fprintf(timing, "Time to read data file = *ld microseconds\n",readtime) ; 
for(size=0; size<55; size +=delta) 
{ 
num_satdata = size + 5; 
for (nod=0)) nod=<=3 - i4+noed) 
{ 
It (nods...) 
num nodes = 3; 
else 
if(nod == 1) 
num_nodes = 7; 
else num_nodes = 15; 
FOR (1 Ss=0--1ts<10- eo es 
{ 
leftover=0; 
setsleft=0; 
sets=0; 
num_sets=0; 
gettimeofday(&ts[{2}), (struct timeval *)0);/* BEGIN END TO END TIME*/ 


ee ENROLL IN PVM Rey ee 


Mytic) = pymemyeid |) ; 


/* START UP SLAVE TASKS ay 
num=pvm_spawn (SLAVENAME, (char**)0, 0, “", num_nodes, tids); 
colleéctor=tids( 0), 
a SEND SLAVES THIER INDICES INTO THE TID ARRAY */ 
msqctag— 1: 


for (1=0; i<num_nodes; ++1) 
{ pvm_initsend(PvmDataRaw) ; 
Pv pK amie (Gael) e. 


ve r==0)) 
pyvm_pkint (&numsat, 1, 1); 
else 


pyvm_pkint(&collector, 1, 1); 
pyvm_send( tids[{i], msgtag) ; 
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/* SEND SETS OF SATELLITE DATA TO WORKERS */ 
msgqtag=2; 
k=0:; 
work_nodes=num_nodes-1l1; 
sets=numsat/num_satdata; 
leftover=numsat-sets*num_satdata; 
ames 
for (j=l1;jJ<num_nodes; ++j) /* DEAL SET OF SATS TO EACH WORKER */ 
{ pyvm_initsend(PvmDataRaw) ; 

pyvm_pkint (&num_satdata,1,1); 

for (k=0; k<num_satdata; ++k) 

{ pvm_pkdouble(sat[i], num_elements,1); 

1=i4¢+l; 
) 
pyvm_send(tids[{j], msgtag); 


sets=sets-work_nodes; 
num _sets=sets/ (2*work_nodes); 


hort j=l; J<num_ nodes; ++7) / Babealsiy2p necords ~*/ 
{ 
for(s=0; s<num_sets; ++s) 
{ 
pyvm_initsend(PvmDataRaw) ; 
pyvm_pkint (&num_satdata,1,1); 
for (k=0; k<num_satdata; ++k) 
{ 
pyvm_pkdouble(sat[{i}, num_elements,1); 
L=i4+1: 
} 
pyvm_send(tids[{j],msgtag) ; 


} 


sets=sets-(num_sets*work_nodes) ; 
num_sets=sets/work_nodes; 
setsleft=sets-(num_sets*work_nodes) ; 
/* Deal remaining records in equal packets */ 
for(j=1; j<num_nodes; ++]j) 
{ 
for(s=0; s<num_sets; ++s) 
{ 
pvm_initsend(PvmDataRaw) ; 
pyvm_pkint (&num_satdata,1,1); 
for (k=0; k<num_satdata; ++k) 
{ 
pyvm_pkdouble(sat[i]}], num_elements,1); 
1=i+l; 
} 
pyvm_send(tids[j],msgtag); 
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if (setsleft>0) /*send leftover sets*/ 
{ 
for(s=0; s<setsleft; ++4+s) 
{ 
pvm_initsend(PvmDataRaw) ; 
pyvm_pkint (&num_satdata,1,1); 
for (k=0; k<num_satdata; ++k) 
{ 
pvm_pkdouble(sat[i], num_elements,1) ; 
j=1i+1; 
} 
pyvm_send(tids[1],msgtag) ; 
} 
} 
if (leftover>0) /* send leftover records*/ 
{ 
pyvm_initsend(PvmDataRaw) ; 
Pym _pkint(Slettever, i) 
for (j=0; j<leftover; +4+)) 
{ 
pvm_pkdouble(sat[i], num_elements,1); 
izit+l; 
} 
pvm_send(tids[1],msgtag) ; 
} 
pyvm_initsend(PvmDataRaw) ; 
pyvm_pkint(&done, 1, 1); /* TELL WORKERS NO MORE DATA IS COMING*/ 
pvm_mcast (tids,num_nodes, msgtag); 


msgtag=5;/* RECEIVE PROGRAM COMPLETE SIGNAL FROM COLLECTOR “7 
pyvm_recv(-1,msgtag) ; 


gettimeofday (&ts[3], (struct timeval *)0); /* END TO END TIME*/ 


/* GATHER TIMING STATISTICS FROM SLAVES */ 
msgtag=4; 
for (1=0; i<num_nodes; ++i) 
{ pyvm_recv(-1,msgtag) ; 
DVN UpKINEtowhno,1. 1) 
Lee who. — me) /* TIMES FROM GOLEECTOR.*/ 
{ 
pyvm_upklong (&e comm, 171)>/* TIME COLLEGTORSGoMia 
} 
else /* TIMES FROM WORKERS */ 
{ 
pvm_upklong(&cctime,1,1); 
calctime=calctime+cctime; 
pyvm_upklong(&cmtime,1,1); 
commt ime=commt ime+cmtime; 
} 
} 


pPVINEeXIt 1); 
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/* COMPUTE OVERALL TIMING STATISTICS */ 
/*COMM TIME*/ 


endtoend=(float) (ts[{3].tv_sec-ts[2] .tv_sec) *1000000+ 
(float)ts[3].tv_usec-(float)ts[2].tv_usec; 


/*convert to seconds*/ 

Cc _comm=c comm 1, 0EG; 
endtoend=endtoend/1.0E6; 
commt ime=commtime/1.0E6; 
calctime=calctime/1.0E6; 
i TOTAL: TIME*/ 


average = average + endtoend; 
aveoll ="avcoll + ¢_ comm, / “GOL Lector communication time*/ 
avcomm = avcomm + commtime; /*worker communication time*/ 
avealec = avcalc + calctime; /*worker calculation time*/ 
endtoend = 0.0;calctime = 0.0;commtime = 0.0;c_comm =0710) 
} 
average = average/its; 


aveaoll = avecoll/its; 

avcomm = avcomm/its; 

avcealc = avcalc/its; 
avpcm=avcomm/ (num_nodes-1) ; 
avpcl=avcalc/ (num_nodes-1) ; 
aa=(avpcm/avpcl) *100; 

/* Print results to output file - not shown in this code */ 
average=0.0; 

auvcoll=0.0; 

avcomm=0 .0; 

avealc=0 .0; 

avpcm=0.0; 

aoe l=0 .0; 

aa=0.0; 


) 


fclose(timing); 
printf("ENTIRE SEQUENCE COMPLETE - results have been appended to timing"); 
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* * 
* sat _slave_SDI.c LAST UPDATE= 12 CG? 1992 5 
= LT S.K. BREWER y 
* This is the slave program for Successive Deal I. - 
: It uses PVM to simulate a 2D torus of processors. * 
. The slave with index 0 will be the collecting node. * 
* The Fortran sub-routine "“sgp4m" is called to perform * 
* the calculations fer orbit prediction. a 
KREERERREREERRERE EER RRRE RRR TT RO ee ee 
#include "“pvm2 on - /* INCLUDE PVM FUNCTIONS Pa 
#include <stdio.h> 
#include <sys/time.h> 
#include <time.h> 
#include <math.h> 
#include <sys/types.h> 
main () 
{ 
double results[7*100+1]; /* ARRAY OF CRESULTS =, 
TONE num_elements=22; /* NUMBER OF FIELDS */ 
double sat_data[22]}; /* ONE SATELLITE INPUT RECORD*/ 
aerate sats=l,maxsats; 
aie sat_no; 
ovate iy, ky, &. pllength. COuUntERS we 
tite Crass | /* ARRAY OF PROCESSOR IDS oa) 
Dye mytid, numnode; /* MY PROCESSOR ID a 
Ae me, collector; /* MY INDEX INTO THE TIDS ARRAY 
ine master,msgtag, msgtag2=2, msgtag3=3; 
struce timeval ts[4]}; 
float s=0.0, u=0.0, totalltime, calc, comm: 
extern sgp4m_ (); /* EXTERNAL SUB-ROUTINE */ 
mMytla = pvVmumyela)- /* ENROLL IN PVM ay 
Master=pvm_parent(); 
/* RECEIVE MY INDEX AND COLLECTOR'S TID FROM MASTER a 


gettimeofday (&ts[0], (struct timeval *)0); 

MSGtag =a: 

pvm_recv( -l, msgtag ); 

pyvm_upkint(&me, 1, 1); /*GET MY INDEX IN THE ARRAY OF TIDs*/ 
PYVMZUpKInNE (Geeltector- 1 a1). - 
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Lt (me o== 0} (le Siero tie, COLLECTING NODE *7 
{ 
maxsats=collector; 
for(1=0; i<maxsats; ++1) 
{ 
pym_recv( -1, msgtag3); 
PYM-UpkKint (&ksat-no, 1, 1);/* RECEIVE RESULT Sets */ 
PyuMAUpKIneersVengehn. 1 1); 
pyvm_upkdouble(results, r_length, 1); 
} 
msgtag=5; /* TELL MASTER ALL RESULTS HAVE BEEN received */ 
pvm_initsend(PvmDataRaw) ; 
pvm_send(master, msgtag); 


else /* If I AM A WORKING NODE */ 

‘ 

while(sats>0) /* REPEAT UNTIL MASTER SENDS DONE SIGNAL */ 
{pvm_recv(-1l, msgtag2); 
DYM=Uprane(&sats, 1, 1)-> 
for (i1=0; i<sats; ++1) 
{ pvm_upkdouble(sat_data, num_elements ,1); 
sat_no=(int)sat_data[1l1]; 
gettimeofday (&ts[2], (struct timeval *)0); 
sgp4m_ (sat_data, results); / SCRUE SUB -ROUTINE “7 
gettimeofday (&ts[3], (struct timeval *)0); 
S=s+ts([3]).€v_sec-ts([2].tv_sec; 
Y=u+ts (3 ].tv_usec-ts([2].tv_usec; 
rolength=7 * (int) results(0)]+1; 
pvm_initsend(PvmDataRaw) ; 


DvVM_pkKinte( «sate.no, 1, 1 ); /* SATELLITE NUMBER*/ 
pyvm_pkint( &r_length, 1, 1); 

pvm_pkdouble( results, r_length, 1 ); J * PACK. */ 
pyvm_send(collector, msgtag3); /* SEND */ 


} 
} 
fee LiIMINGSSTATISTICS TO BE SENT TO MASTER */ 
gettimeofday (&ts[1], (struct timeval *)0); 
totaltime=(float) (ts[1].tv_sec-ts[0] .tv_sec)*1000000+ 
(float)ts[1].tv_usec-(float)ts[0].tv_usec; 


Galc = s*1000000 + u:; 
comm = totaltiame - cale; 
msgtag=4; 


pyvm_initsend(PvmDataRaw) ; 
pyvm_pkint(&me, 1,1); 
ie (me == (0) 
{ 
pyvm_pklong(&totaltime,1,1); 


} 
else 
{ 
Pvmerhtongecale, 1, 1)- pvm_pklong(&comm, 1,1); 
} 
pvm_send(master,msgtag) ;pvm_exit(); 
} 
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*. 


sat_master_SDII.c LAST UPDATE: Oct 13 1993 * 


Sek eh eee 


This is the master program for the Successive = 
It uses PVM to simulate a 2D torus of * 
processors; n+l slaves are spawned, of whichn * 
are working nodes and 1 is the collecting node. * 
Satellite data is issued to the workers by *~ 
constantly dealing out equal size data packs. - 
Timing data, collecting for statistical purposes* 
are placed in the file "timrr”" which will be * 
placed in the directory from which this master * 

* 
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program 1s invoked. 
KEKE KKKKEKKKMKEKKKKMEKKEKEKKEKEKEKEEKEKKEKKEKEKKKKKKKKKKKKEKK 


#include <stdio.h> /* INCLUDE STANDARD I/O FUNCTIONS */ 
#include "“pvm3.h" /* INCLUDE PYM FUNCTIONS 357 
#include <sys/time.h> 

#include <time.h> 

#include <math.h> 

#include <sys/types.h> 


#define SLAVENAME “t.run" 
int main(argc, argv) /* GET FILE NAME FROM COMMAND LINE */ 


Int 

char 

{ 
int 
int 
init 
double 
int 
int 
ne 
aris 
struct 
int 
float 
float 
float 
FILE 


argc; 
*argv[]; 
num_nodes; /* NUMBER OF SLAVE NODES */ 
num_satdata; /* # input records dealt */ 
num_elements=22; 
sat[10000] [22]; lA ee 


1ts,nod,si2ze,delta=5; 

num, Mytid, 1=0, jek els lo Zieeenicadrcace 

numsat=0, collector, leftover, worker, sets; 
work_nodes, done=0,reading=1; 

timeval ts[4]; /* Number of time stamps * / 

who; 

endtoend,tcomm, average=0.0,avcoll=0.0; 
avcomm=0.0,avcalc=0.0, readtime, c_comm, avpcm=0.0; 
cmtime, commtime, cctime, calctime, avpcl=0.0,aa=0.0; 
Arnis le,, “Camanar 


/* BEGIN READING DATA FILE */ 
gettimeofday(&ts[0], (struct timeval *)0); 
/* OPEN DATA FILE */ 


If ((antile 


= fopen(argvil], 85") ) == NULL) 


{ printf(“infile = %s did not open\n", argv[1])); 
exit(l1); 


) 
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ee READ ENTIRE DATA FILE AT ONCE af 
while(reading != EOF) 
{ if ((reading = fscanf(infile, °%1lf", &sat[{numsat][0})) != EOF) 
for (j=1; j<num_elements; ++)) 
fscanf(infile, "%lf"°, &sat[numsat][3]); 
numsat=numsat+1;/* COUNT NUMBER OF SATELLITES IN DATA FILE */ 


folose(infile); 
numsat=numsat-l1; 
/* END READING DATA FILE */ 
gettimeofday (&ts[1], (struct timeval *)0); 
j/* SET UP PILE FOR TIMING STATISTICS n/ 
Eimeng = fOpent + timer rs,. sa.) 
readtime = (ts[1].tv_sec-ts[0] .tv_sec) *1000000+ 
ts({1].tv_usec-ts([0].tv_usec; 


for(size=0; size<55; size +=delta) 


{ 


num_satdata = size + 5; 


for(nod=0; nod<3; ++nod) 


{ 
LE Mod == 0) 


num_nodes = 3; 
else 
if(nod == 1) 


num_nodes = 7; 
else num_nodes = 15; 

Goertlt@s-0- i1ts<10- 4+i1ts) 

{ /* BEGIN END TO END TIME */ 

gettimeofday (&ts[2]), (struct timeval *)0); 
ane Re eS ENROLL IN PVM Roe ree ee 7, 


mye id = pyvm mytid(): 


Pee stART UP SLAVE. TASKS ney 
WumM=pvm spawn (SLAVENAME, (char**)0, 0, “", num_nodes, tids); 
eotlector—tidas|( 0}; 


i SEND SLAVES THIER INDICES INTO THE TID ARRAY */ 
msqcag=i- 
for (1=0; i<num_nodes; +41) 
{ pvm_initsend(PvmDataRaw) ; 
Pum pkint (1,1, 1); 


if (i1=+==0) 
Pvuneprine (cnumsat,e 1° 1); 
else 


PvinawmIne(<collector;<], 1); 
pyvm_send( tids[i], msgtag); 
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/* SEND SETS OF SATELLITE DATA TO WORKERS */ 
msgtag=2; 

k= oe 

work_nodes=num_nodes-1; 
sets=numsat/num_satdata; 
leftover=numsat-sets*num_satdata; 


for (1=0; i<sets; +41) 
worker = i-(i/work_nodes) *work_nodes+1; 


pym_initsend(PvmDataRaw) ; 

pyvm_pkint (&num_satdata, 1, 1); 

for(j=0; j<num_satdata; ++)) 

{ pvm_pkdouble(sat[{k], num_elements, 1); 
k=k+1; 

} 


pyvm_send( tids[worker], msgtag); 


if (leftover>0) /* SEND LEFTOVERS */ 
pyvm_initsend(PvmDataRaw) ; 
PvmepKIneelettover 1s al) 
for(3=0: j<leftover; +4)) 
{ pyvm_pkdouble(sat[{k], num_elements, 1); 
k=k4+1; 
} 
pvm_send(tids[work_nodes], msgtag); 
} 
pyvm_initsend(PvmDataRaw) ; 
/* TELL WORKERS NO MORE DATA IS COMING */ 
pyvm_pkint(&done, 1, 1); 
for(j=l; j< num_nodes; ++)) 
( 
pym_send(tids[{j]), msgtag); 
} 
msgtag=5;/* RECEIVE PROGRAM COMPLETE SIGNAL FROM COLLECTOR*® / 
pyvm_recv(-1,msgtag); 
/* COMPLETE END TO END TIME 7 
gettimeofday (&ts[3]}], (struct timeval *)0); 
/* GATHER TIMING STATISTICS FROM SLAVES */ 


msgtag=4; 
for (1=0; i<num_nodes; ++i) 
pyvm_recv(-1,msgtag) ; 


DPVMEUPKING (Gwhe wl.) 
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Pe whioe== 0) /* TIMES FROM COLLECTOR */ 

{ 

pyvm_upklong(&¢_comm,1,1); /* TIME COLLECTOR SPENT COMMUNICATING */ 

} 

else 

/* TIMES FROM WORKERS */ 

{ pvm_upklong(&ectime,1,1); /* TIME SPENT CALCULATING RESULTS */ 
calctime=calctime+cctime; 
pyvm_upklong(&cmtime,1,1); /* TIME SPENT COMMUNICATING OR WAITING */ 
commtime=commtime+cmtime; 

} 

) 


pyvm_ exit (); 


/* COMPUTE OVERALL TIMING STATISTICS */ 
/*COMM TIME*/ 
endtoend=(float) (ts[3]).tv_sec-ts[2]) .tv_sec) *1000000+ 
(fEloatyres(s)] [ev_usec-(float)ts(2].tvl2usec: 
/*convert to seconds*/ 
c¢_comm=c_comm/1.0E6; 
endtoend=endtoend/1.0E6; 
commtime=commtime/1.0E6; 
calctime=calctime/1.0E6; 
fe TOTAL TIME* / 


average = average + endtoend; 
aveoll = avceli + ¢c_ comm; /*collector communication time*/ 
avcomm = avcomm + commtime; /*worker communication time*/ 
aveale = avcalc + calctime; /*worker calculation time*/ 
endtoend = 0.0;calctime = 0.0;commtime = 0.0;c_comm =“0:..0 > 
average = average/its; 
evcoll = aveoll/its: 
avcomm = avcomm/its; 
evcale = avcalc/its-; 


avpcm=avcomm/ (num_nodes-1) ; 
avpcl=avcalc/ (num_nodes-1) ; 
aa=(avpem/avpel)*100; 
feemePrint statistics to olitput file - not shown in code */ 
average=0.0; 

aveoll=0:.0:; 

avcomm=0.0; 

avcalc=0.0; 

erocm— 0 . 0: 

avocl=0.0; 

aa=0.0; 


fclose(timing) ; 
printf("*ENTIRE SEQUENCE COMPLETE ") ; 


89 


[REREAEREREEEEREEREERE ERE R AE RE KR ET 
* * 
* sat _slave_SDII.c LAST UPDATE: 13 OCT 1993 : 
* LT S.K. BREWER * 
* This is the slave program for Successive Deal I. > 
- It uses PVM to simulate a 2D torus of processors. 2 
* The slave with index 0 will be the collecting node. * 
* The Fortran sub-routine “sgp4m" is called to perform ‘ 
* the calculations for orbit prediction. x 
KEKAEKREREAKE SD SR RT Re ee a ee ee 
#include “pvm3.h" /* INCLUDE PVM FUNCTIONS * / 
#include <stdio.h> 
#include <sys/time.h> 
#include <time.h> 
#include <math.h> 
#include <sys/types.h> 
main () 
{ 
double results [7*100+1]; /* ARRAY OF RESULTS 77 
int num_elements=22; /* NUMBER OF FIELDS */ 
double sat_data[22]; /* ONE SATELLITE INPUT RECORDS 
int sats=l1,maxsats; 
Tat sat_no; 
int 1,95, kb. FalenceGh a7) - COUNTERS +7 
a Tait Gace (sy |: /* ARRAY OF PROCESSOR IDS aes 
int mytid, numnode; /* MY PROCESSOR ID */ 
ant me, collector; /* MY INDEX INTO THE TIDS ARRAY */ 
ant master,msgtag, msgtag2=2, msgtag3=3; 
struct timeval ts[4]; 
float s=0.0, u=0.0, totaltime, calc, comm; 
extern sgp4m_ (); /* EXTERNAL SUB-ROUTINE */ 
Mveid) =) pvnemy era) /* ENROLL IN PVM oa 
master=pvm_parent (); 
/* RECEIVE MY INDEX AND COLLECTOR’S TID FROM MASTER M 


gettimeofday (&ts[0], (struct timeval *)0); 

msgtag = 1; 

pyvm_recv( -1, msgtag ); 

pvm_upkint (&me, 1, 1); /*GET MY INDEX IN THE ARRAY OF TIDs*/ 
pvm_upkine (&collector. i. 1) 
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if(me == 0) /* IF I AM THE COLLECTING NODE */ 
{ 
maxsats=collector; 
for(i=0;> i<maxsats- +41) 
{ 
pyvm_recv( -1, msgtag3); 
pyvm_upkint (&sat_no, 1, 1);/* RECEIVE RESULT Sets */ 
PDvMoVpkine (ar elength, 1, 1); 
pyvm_upkdouble(results, r_length, 1); 
} 
msgtag=5; /* TELL MASTER ALL RESULTS HAVE BEEN received */ 
pyvm_initsend(PvmDataRaw) ; 
pyvm_send(master, msgtag); 


else /* If I AM A WORKING NODE */ 

{ 

while(sats>0) /* REPEAT UNTIL MASTER SENDS DONE SIGNAL */ 
(eve Lecv(—1, medtag2): 
pyvm_upkint (&sats, 1, 1); 
for (1=0; i<sats; +41) 
{ pvm_upkdouble(sat_data, num_elements ,1); 
Saeeno] UiInt lsat data| 1]: 
gettimeofday (&ts[2], (struct timeval *)0); 
sgp4m_ (sat_data, results); /* CALL SUB-ROUTINE */ 
gettimeofday (&ts[3], (struct timeval *)0); 
s=s+ts[3].tv_sec-ts[2].tv_sec; 
Wawro ey usec —ts(2 |. tv-usec; 
rokengen=/“tineyresules{0)+1; 
pyvm_initsend(PvmDataRaw) ; 


DvmMepkint | &sat no. 1, 1). /* SATELLITE NUMBER*® / 
Dviniep tenet Sr length, 1,9 1) + 

bv preouble{ results, r-length, 1); jf "Phew 7 
pyvm_send(collector, msgtag3); /* SEND */ 


} 
} 
}/* TIMING STATISTICS TO BE SENT TO MASTER */ 
gettimeofday (&ts[1l], (struct timeval *)0); 
Bocaltime=(float) (ts[1).tv_sec-ts/[0].tv_sec) *1000000+ 
(float)ts[1].tv_usec-(float)ts[0].tv_usec; 


eale = s*17000000 + u: 
eomm = totaltime - calc; 
msgtag=4; 


pyvm_initsend(PvmDataRaw) ; 
Pym pkKint (ame, 1,1); 
mame == 0) 
{ 

pyvm_pklong(&totaltime,1,1); 
} 
else 
{ 

Polepklong (scale, 1,1); pym_pklong(&comm, 1,1); 

) 

pvm_send(master,msgtag) ;pvm_exit(); 


) 
Sd 


[LEER RREEKEREREREERERERREARERERKER EERE KNEE 
* 


seq.c LT S.K. BREWER OCT 25.53 


This is a sequential version of the satellite orbit 


* 
* 
* 
prediction program using SGP4. * 
* 
/ 


+ + + + + 


KKEKKKKKEKKEKEKEKEEKEKEKEKEKEKEEKEEKEKKEKEKKEKEKEKEKEKKEEKEKEKKKRKKKKKKKKKKK 


#include <stdio.h> /* INCLUDE STANDARD I/O FUNCTIONS? / 
#include <sys/time.h> 
#include <time.h> 
#include <math.h> 
#include <sys/types.h> 
int main(arae, ard) /* GET FILE NAME FROM COMMAND LINE*/ 
A faits argc; 
char ~argyv | |; 
{ 
int iterations=50; 
It num_elements=22; 
double sat [32000][22];/*ARRAY OF SATELLITE INPUT DATA 
int its; /* NUMBER OF ITERATIONS OF THE PROGRAM */ 
init l= O06 yk; ey; Lreading=i 
DL he numsat=0; 
struct timeval ts[4]; /* Number of Time Stamps Required*/ 
float endtoend=0.0,average=0.0 
long readtime; 
aa sat_no; 
double resiilts (7 *10021 )- 
FILE + i Eales a SiG 


extern sgp4m_ (); 

/* BEGIN READING DATA FILE */ 
gettimeofday (&ts[0], (struct timeval *)0); /* OPEN DATA FILE*/ 
1f ((intile = fopen{argqvll). “1-)))==s0UEn, 


{ Printft(“*infile = ¢s did netwopen a wara. | wip. 
exit( 1) 
} 
i READ ENTIRE DATA FILE AT ONCE oa 
while(reading != EOF) 
{ 1f ((reading = fscanf(infile,“l1f", &sat [numsat][0])) =] sen 
for (j=l; jJ<num_elements; ++)3) 
fscanf(infile, "lf", &sat[numsat][3]); 
numsat=numsat+1; /* COUNT NUMBER OF SATELLITES IN DATA FILE */ 


} 
fclose(infile); 
numsat=numsat-1; 


gettimeofday(&ts[1], (struct timeval *)0); /* END READING DATA FILE 


22 


i Sen OP FILE FOR TIMING STATISTICS my 
Pimncga— copen( timing.seq*, “a“); 
readtime = (ts[{1].tv_sec-ts[0] .tv_sec)*1000000+4 
ts{1]).tv_usec-ts[0].tv_usec; 
for(its=0; its<iterations; ++its) 
{ 
gettimeofday (&ts[2], (struct timeval *)0); 
for (120; i<numsat;: +41) 
{ Sat ino=(inte)sat{ i} (1); 
sgp4m_ (sat{iJ, results); 
) 
gettimeofday(&ts[3], (struct timeval *)0); 
endtoend=(float) (ts[3].tv_sec-ts[2].tv_sec) *1000000+ 
(float)ts[3].tv_usec-(float)ts{2]) .tv_usec; 
/* convert to seconds */ 
endtoend=endtoend/1.0E6; 
Veewrite results to timing output file */ 


feiinter(taming, "\n Endtoend time (sec) = %6.2f\n",endtoend) ; 


Vemrotal Time */ 
average=average+endtoend; 


} 
average=average/its; 
fprintf(timing, "\n Average Endtoend time (sec)= %6.2f\n", 


fclose(timing); 


printf("*\nENTIRE SEQUENCE COMPLETE "“); 
} 


2S 


average); 


10. 


hake 
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